Affiliation:
1. North Carolina State University
Abstract
Multimedia applications require a significantly higher level of performance than previous workloads of embedded systems. They have driven digital signal processor (DSP) makers to adopt high-performance architectures like VLIW (Very-Long Instruction Word). Despite many efforts to exploit instruction-level parallelism (ILP) in the application, the speed is a fraction of what it could be, limited by the difficulty of finding enough independent instructions to keep all of the processor's functional units busy.
This article proposes Software Thread Integration (STI) for instruction-level parallelism. STI is a software technique for interleaving multiple threads of control into a single implicitly multithreaded one. We use STI to improve the performance on ILP processors by merging parallel procedures into one, increasing the compiler's scope and hence allowing it to create a more efficient instruction schedule. Assuming the parallel procedures are given, we define a methodology for finding the best performing integrated procedure with a minimum compilation time.
We quantitatively estimate the performance impact of integration, allowing various integration scenarios to be compared and ranked via profitability analysis. During integration of threads, different ILP-improving code transformations are selectively applied according to the control structure and the ILP characteristics of the code, driven by interactions with software pipelining. The estimated profitability is verified and corrected by an iterative compilation approach, compensating for possible estimation inaccuracy. Our modeling methods combined with limited compilation quickly find the best integration scenario without requiring exhaustive integration.
Funder
National Science Foundation
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Software
Reference37 articles.
1. Aigner G. Diwan A. Heine D. L. Lam M. S. Moore D. L. Murphy B. R. and Sapuntzakis C. 1999. An overview of the SUIF2 compiler infrastructure. http://Suif.stanford.edu/Suif/Suif2/doc-2.2.0-4/. Aigner G. Diwan A. Heine D. L. Lam M. S. Moore D. L. Murphy B. R. and Sapuntzakis C. 1999. An overview of the SUIF2 compiler infrastructure. http://Suif.stanford.edu/Suif/Suif2/doc-2.2.0-4/.
2. Aiken A. and Nicolau A. 1987. Loop quantization: an analysis and algorithm. Tech. rep. Cornell University Ithaca NY. Aiken A. and Nicolau A. 1987. Loop quantization: an analysis and algorithm. Tech. rep. Cornell University Ithaca NY.
3. Conversion of control dependence to data dependence
4. Compiling C for vectorization, parallelization, and inline expansion
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献