GEVO

Author:

Liou Jhe-Yu1,Wang Xiaodong2,Forrest Stephanie3,Wu Carole-Jean4

Affiliation:

1. Arizona State University, Tempe, AZ

2. Facebook, Menlo Park, CA

3. Arizona State University and Santa Fe Institute, Santa Fe, NM

4. Arizona State University and Facebook, Menlo Park, CA

Abstract

GPUs are a key enabler of the revolution in machine learning and high-performance computing, functioning as de facto co-processors to accelerate large-scale computation. As the programming stack and tool support have matured, GPUs have also become accessible to programmers, who may lack detailed knowledge of the underlying architecture and fail to fully leverage the GPU’s computation power. GEVO (Gpu optimization using EVOlutionary computation) is a tool for automatically discovering optimization opportunities and tuning the performance of GPU kernels in the LLVM representation. GEVO uses population-based search to find edits to GPU code compiled to LLVM-IR and improves performance on desired criteria while retaining required functionality. We demonstrate that GEVO improves the execution time of general-purpose GPU programs and machine learning (ML) models on NVIDIA Tesla P100. For the Rodinia benchmarks, GEVO improves GPU kernel runtime performance by an average of 49.48% and by as much as 412% over the fully compiler-optimized baseline. If kernel output accuracy is relaxed to tolerate up to 1% error, GEVO can find kernel variants that outperform the baseline by an average of 51.08%. For the ML workloads, GEVO achieves kernel performance improvement for SVM on the MNIST handwriting recognition (3.24×) and the a9a income prediction (2.93×) datasets with no loss of model accuracy. GEVO achieves 1.79× kernel performance improvement on image classification using ResNet18/CIFAR-10, with less than 1% model accuracy reduction.

Funder

Defense Advanced Research Projects Agency

National Science Foundation

Air Force Research Laboratory

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Reference114 articles.

1. TensorFlow. 2018. XLA is a compiler that optimizes TensorFlow computations. Retrieved from https://www.tensorflow.org/xla/. TensorFlow. 2018. XLA is a compiler that optimizes TensorFlow computations. Retrieved from https://www.tensorflow.org/xla/.

2. Advanced Micro Devices Inc. 2020. AMD Exascale Supercomputer. Retrieved from https://www.amd.com/en/products/exascale-era. Advanced Micro Devices Inc. 2020. AMD Exascale Supercomputer. Retrieved from https://www.amd.com/en/products/exascale-era.

3. The SprayList: a scalable relaxed priority queue

4. General purpose molecular dynamics simulations fully implemented on graphics processing units

Cited by 10 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Genetic Improvement of Last Level Cache;Lecture Notes in Computer Science;2024

2. Jaws 30;Genetic Programming and Evolvable Machines;2023-11-22

3. Iterative genetic improvement: Scaling stochastic program synthesis;Artificial Intelligence;2023-09

4. The Impact of Code Bloat on Genetic Program Comprehension: Replication of a Controlled Experiment on Semantic Inference;Mathematics;2023-08-31

5. Genetic Improvement of OLC and H3 with Magpie;2023 IEEE/ACM International Workshop on Genetic Improvement (GI);2023-05

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3