Affiliation:
1. UNSW Australia/NUDT, China
2. UNSW Australia, NSW, Australia
Abstract
Existing vectorization techniques are ineffective for loops that exhibit little loop-level parallelism but some limited superword-level parallelism (SLP). We show that effectively vectorizing such loops requires partial vector operations to be executed correctly and efficiently, where the degree of partial SIMD parallelism is smaller than the SIMD datapath width. We present a simple yet effective SLP compiler technique called P
aver
(PArtial VEctorizeR), formulated and implemented in LLVM as a generalization of the traditional SLP algorithm, to optimize such partially vectorizable loops. The key idea is to maximize SIMD utilization by widening vector instructions used while minimizing the overheads caused by memory access, packing/unpacking, and/or masking operations, without introducing new memory errors or new numeric exceptions. For a set of 9 C/C++/Fortran applications with partial SIMD parallelism, P
aver
achieves significantly better kernel and whole-program speedups than LLVM on both Intel’s AVX and ARM’s NEON.
Funder
Australian Research Council
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Cited by
28 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Optimizing Stencil Computation on Multi-core DSPs;Proceedings of the 53rd International Conference on Parallel Processing;2024-08-12
2. Boost Linear Algebra Computation Performance via Efficient VNNI Utilization;Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3;2024-04-27
3. PresCount: Effective Register Allocation for Bank Conflict Reduction;2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO);2024-03-02
4. Occamy: Elastically Sharing a SIMD Co-processor across Multiple CPU Cores;Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3;2023-03-25
5. High Performance and Power Efficient Accelerator for Cloud Inference;2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA);2023-02