Affiliation:
1. University of Technology Sydney (UTS), Australia
2. University of New South Wales (UNSW), Australia
Abstract
Compiler-based vectorization represents a promising solution to automatically generate code that makes efficient use of modern CPUs with SIMD extensions. Two main auto-vectorization techniques, superword-level parallelism vectorization (SLP) and loop-level vectorization (LLV), require precise dependence analysis on arrays and structs to vectorize isomorphic scalar instructions (in the case of SLP) and reduce dynamic dependence checks at runtime (in the case of LLV).
The alias analyses used in modern vectorizing compilers are either intra-procedural (without tracking inter-procedural data-flows) or inter-procedural (by using field-sensitive models, which are too imprecise in handling arrays and structs). This article proposes an inter-procedural
L
oop-oriented
P
ointer
A
nalysis for C, called L
pa
, for analyzing arrays and structs to support aggressive SLP and LLV optimizations effectively. Unlike field-insensitive solutions that pre-allocate objects for each memory allocation site, our approach uses a lazy memory model to generate
access-based location sets
based on how structs and arrays are accessed. L
pa
can precisely analyze arrays and nested aggregate structures to enable SIMD optimizations for large programs. By separating the location set generation as an independent concern from the rest of the pointer analysis, L
pa
is designed so that existing points-to resolution algorithms (e.g., flow-insensitive and flow-sensitive pointer analysis) can be reused easily.
We have implemented L
pa
fully in the LLVM compiler infrastructure (version 3.8.0). We evaluate L
pa
by considering SLP and LLV, the two classic vectorization techniques, on a set of 20 C and Fortran CPU2000/2006 benchmarks. For SLP, L
pa
outperforms LLVM’s BasicAA and ScevAA by discovering 139 and 273 more vectorizable basic blocks, respectively, resulting in the best speedup of 2.95% for 173.applu. For LLV, LLVM introduces totally 551 and 652 static bound checks under BasicAA and ScevAA, respectively. In contrast, L
pa
has reduced these static checks to 220, with an average of 15.7 checks per benchmark, resulting in the best speedup of 7.23% for 177.mesa.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Software
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. SPATA: Effective OS Bug Detection with Summary-Based, Alias-Aware and Path-Sensitive Typestate Analysis;ACM Transactions on Computer Systems;2024-09-06
2. Path-sensitive and alias-aware typestate analysis for detecting OS bugs;Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems;2022-02-22
3. Duplo: a framework for OCaml post-link optimisation;Proceedings of the ACM on Programming Languages;2020-08-02
4. The computer for the 21st century: present security & privacy challenges;Journal of Internet Services and Applications;2018-12