Affiliation:
1. The Ohio State University
Abstract
Automatic vectorization is critical to enhancing performance of compute-intensive programs on modern processors. However, there is much room for improvement over the auto-vectorization capabilities of current production compilers through careful vector-code synthesis that utilizes a variety of loop transformations (e.g., unroll-and-jam, interchange, etc.).
As the set of transformations considered is increased, the selection of the most effective combination of transformations becomes a significant challenge: Currently used cost models in vectorizing compilers are often unable to identify the best choices. In this paper, we address this problem using machine learning models to predict the performance of SIMD codes. In contrast to existing approaches that have used high-level features of the program, we develop machine learning models based on features extracted from the generated assembly code. The models are trained offline on a number of benchmarks and used at compile-time to discriminate between numerous possible vectorized variants generated from the input code.
We demonstrate the effectiveness of the machine learning model by using it to guide automatic vectorization on a variety of tensor contraction kernels, with improvements ranging from 2× to 8× over Intel ICC's auto-vectorized code. We also evaluate the effectiveness of the model on a number of stencil computations and show good improvement over auto-vectorized code.
Funder
National Science Foundation
Division of Computing and Communication Foundations
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Reference33 articles.
1. Using Machine Learning to Focus Iterative Optimization
2. Rapidly Selecting Good Compiler Optimizations using Performance Counters
3. Chen C. Chame J. and Hall M. 2008. CHiLL: A framework for composing high-level loop transformations. Tech. rep. 08-897 University of Southern California. Chen C. Chame J. and Hall M. 2008. CHiLL: A framework for composing high-level loop transformations. Tech. rep. 08-897 University of Southern California.
Cited by
34 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Automation of Brain Tumor Segmentation Using Deep Learning;Advanced Technologies and Societal Change;2023
2. Improving Vectorization Heuristics in a Dynamic Compiler with Machine Learning Models;Proceedings of the 14th ACM SIGPLAN International Workshop on Virtual Machines and Intermediate Languages;2022-11-29
3. Reinforcement Learning assisted Loop Distribution for Locality and Vectorization;2022 IEEE/ACM Eighth Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC);2022-11
4. Optimal Launch Bound Selection in CPU-GPU Hybrid Graph Applications with Deep Learning;2022 IEEE 13th International Green and Sustainable Computing Conference (IGSC);2022-10-24
5. VICO;Proceedings of the 36th ACM International Conference on Supercomputing;2022-06-28