Affiliation:
1. RWTH Aachen University, Germany
Abstract
Retargetable C compilers are currently widely used to quickly obtain compiler support for new embedded processors and to perform early processor architecture exploration. A partially inherent problem of the retargetable compilation approach, though, is the limited code quality as compared to hand-written compilers or assembly code due to the lack of dedicated optimizations techniques. This problem can be circumvented by designing flexible, retargetable code optimization techniques that apply to a certain range of target architectures. This article focuses on target machines with SIMD instruction support, a common feature in embedded processors for multimedia applications. However, SIMD optimization is known to be a difficult task since SIMD architectures are largely nonuniform, support only a limited set of data types and impose several memory alignment constraints. Additionally, such techniques require complicated loop transformations, which are tailored to the SIMD architecture in order to exhibit the necessary amount of parallelism in the code. Thus, integrating the SIMD optimization
and
the required loop transformations together in a single retargeting formalism is an ambitious challenge. In this article, we present an efficient and quickly retargetable SIMD code optimization framework that is integrated into an industrial retargetable C compiler. Experimental results for different processors demonstrate that the proposed technique applies to real-life target machines and that it produces code quality improvements close to the theoretical limit.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Information Systems,Software
Reference42 articles.
1. Associated Computer Experts (ACE). The COSY compiler development system. http://www.ace.nl. Associated Computer Experts (ACE). The COSY compiler development system. http://www.ace.nl.
2. Advanced RISC Machines Ltd. The ARM11 processor. http://www.arm.com. Advanced RISC Machines Ltd. The ARM11 processor. http://www.arm.com.
3. Conversion of control dependence to data dependence
4. Automatic translation of FORTRAN programs to vector form
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. When Function Inlining Meets WebAssembly: Counterintuitive Impacts on Runtime Performance;Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering;2023-11-30
2. A Case Study of Performance Optimization in a Heterogeneous Environment;2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW);2017-10
3. Vectorization in PyPy's Tracing Just-In-Time Compiler;Proceedings of the 19th International Workshop on Software and Compilers for Embedded Systems;2016-05-23
4. Evaluating vector data type usage in OpenCL kernels;Concurrency and Computation: Practice and Experience;2014-10-23
5. C Compilers and Code Optimization for DSPs;Handbook of Signal Processing Systems;2013