GEVO-Reference-Cited by-同舟云学术

GEVO

Published:2020-12-22 Issue:4 Volume:17 Page:1-28
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Liou Jhe-Yu¹,Wang Xiaodong²,Forrest Stephanie³,Wu Carole-Jean⁴

Affiliation:

1. Arizona State University, Tempe, AZ

2. Facebook, Menlo Park, CA

3. Arizona State University and Santa Fe Institute, Santa Fe, NM

4. Arizona State University and Facebook, Menlo Park, CA

Abstract

GPUs are a key enabler of the revolution in machine learning and high-performance computing, functioning as de facto co-processors to accelerate large-scale computation. As the programming stack and tool support have matured, GPUs have also become accessible to programmers, who may lack detailed knowledge of the underlying architecture and fail to fully leverage the GPU’s computation power. GEVO (Gpu optimization using EVOlutionary computation) is a tool for automatically discovering optimization opportunities and tuning the performance of GPU kernels in the LLVM representation. GEVO uses population-based search to find edits to GPU code compiled to LLVM-IR and improves performance on desired criteria while retaining required functionality. We demonstrate that GEVO improves the execution time of general-purpose GPU programs and machine learning (ML) models on NVIDIA Tesla P100. For the Rodinia benchmarks, GEVO improves GPU kernel runtime performance by an average of 49.48% and by as much as 412% over the fully compiler-optimized baseline. If kernel output accuracy is relaxed to tolerate up to 1% error, GEVO can find kernel variants that outperform the baseline by an average of 51.08%. For the ML workloads, GEVO achieves kernel performance improvement for SVM on the MNIST handwriting recognition (3.24×) and the a9a income prediction (2.93×) datasets with no loss of model accuracy. GEVO achieves 1.79× kernel performance improvement on image classification using ResNet18/CIFAR-10, with less than 1% model accuracy reduction.

Funder

Defense Advanced Research Projects Agency

National Science Foundation

Air Force Research Laboratory

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3418055

Reference114 articles.

1. TensorFlow. 2018. XLA is a compiler that optimizes TensorFlow computations. Retrieved from https://www.tensorflow.org/xla/. TensorFlow. 2018. XLA is a compiler that optimizes TensorFlow computations. Retrieved from https://www.tensorflow.org/xla/.

2. Advanced Micro Devices Inc. 2020. AMD Exascale Supercomputer. Retrieved from https://www.amd.com/en/products/exascale-era. Advanced Micro Devices Inc. 2020. AMD Exascale Supercomputer. Retrieved from https://www.amd.com/en/products/exascale-era.

3. The SprayList: a scalable relaxed priority queue

4. General purpose molecular dynamics simulations fully implemented on graphics processing units

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Genetic Improvement of Last Level Cache;Lecture Notes in Computer Science;2024

2. Jaws 30;Genetic Programming and Evolvable Machines;2023-11-22

3. Iterative genetic improvement: Scaling stochastic program synthesis;Artificial Intelligence;2023-09

4. The Impact of Code Bloat on Genetic Program Comprehension: Replication of a Controlled Experiment on Semantic Inference;Mathematics;2023-08-31

5. Genetic Improvement of OLC and H3 with Magpie;2023 IEEE/ACM International Workshop on Genetic Improvement (GI);2023-05