Affiliation:
1. University of Maryland, College Park, MD
2. Embedded Research Solutions, Columbia, MD
Abstract
This article presents a technique for the efficient compiler management of software-exposed heterogeneous memory. In many lower-end embedded chips, often used in microcontrollers and DSP processors, heterogeneous memory units such as scratch-pad SRAM, internal DRAM, external DRAM, and ROM are visible directly to the software, without automatic management by a hardware caching mechanism. Instead, the memory units are mapped to different portions of the address space. Caches are avoided due to their cost and power consumption, and because they make it difficult to guarantee real-time performance. For this important class of embedded chips, the allocation of data to different memory units to maximize performance is the responsibility of the software.Current practice typically leaves it to the programmer to partition the data among different memory units. We present a compiler strategy that automatically partitions the data among the memory units. We show that this strategy is optimal, relative to the profile run, among all static partitions for global and stack data. For the first time, our allocation scheme for stacks distributes the stack among multiple memory units. For global and stack data, the scheme is provably equal to or better than any other compiler scheme or set of programmer annotations. Results from our benchmarks show a 44.2% reduction in runtime from using our distributed stack strategy vs. using a unified stack, and a further 11.8% reduction in runtime from using a linear optimization strategy for allocation vs. a simpler greedy strategy; both in the case of the SRAM size being 20% of the total data size. For some programs, less than 5% of data in SRAM achieves a similar speedup.
Publisher
Association for Computing Machinery (ACM)
Subject
Hardware and Architecture,Software
Reference21 articles.
1. Compiler support for scalable and efficient memory systems
2. Software synthesis and code generation for signal processing systems;Bhattacharyya S. S.;IEEE Trans. Circuits Syst.,2000
3. Consortium T. T. 1999. The Trimaran benchmark suite. Available at http://www.trimaran.org/.]] Consortium T. T. 1999. The Trimaran benchmark suite. Available at http://www.trimaran.org/.]]
Cited by
79 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. MinUn: Accurate ML Inference on Microcontrollers;Proceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems;2023-06-13
2. Optimal Arrangement and Rearrangement of Objects on Shelves to Minimize Robot Retrieval Cost;IEEE Transactions on Automation Science and Engineering;2023
3. Echtzeitfähige Ethernet-Kommunikation in automobilen Multicore-Systemen mit hierarchischem Speicherlayout;Informatik aktuell;2022
4. SPECTRUM;ACM Transactions on Embedded Computing Systems;2020-09-30
5. SoMMA: A software-managed memory architecture for multi-issue processors;Microprocessors and Microsystems;2020-09