An optimal memory allocation scheme for scratch-pad-based embedded systems

Author:

Avissar Oren1,Barua Rajeev1,Stewart Dave2

Affiliation:

1. University of Maryland, College Park, MD

2. Embedded Research Solutions, Columbia, MD

Abstract

This article presents a technique for the efficient compiler management of software-exposed heterogeneous memory. In many lower-end embedded chips, often used in microcontrollers and DSP processors, heterogeneous memory units such as scratch-pad SRAM, internal DRAM, external DRAM, and ROM are visible directly to the software, without automatic management by a hardware caching mechanism. Instead, the memory units are mapped to different portions of the address space. Caches are avoided due to their cost and power consumption, and because they make it difficult to guarantee real-time performance. For this important class of embedded chips, the allocation of data to different memory units to maximize performance is the responsibility of the software.Current practice typically leaves it to the programmer to partition the data among different memory units. We present a compiler strategy that automatically partitions the data among the memory units. We show that this strategy is optimal, relative to the profile run, among all static partitions for global and stack data. For the first time, our allocation scheme for stacks distributes the stack among multiple memory units. For global and stack data, the scheme is provably equal to or better than any other compiler scheme or set of programmer annotations. Results from our benchmarks show a 44.2% reduction in runtime from using our distributed stack strategy vs. using a unified stack, and a further 11.8% reduction in runtime from using a linear optimization strategy for allocation vs. a simpler greedy strategy; both in the case of the SRAM size being 20% of the total data size. For some programs, less than 5% of data in SRAM achieves a similar speedup.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Reference21 articles.

1. Compiler support for scalable and efficient memory systems

2. Software synthesis and code generation for signal processing systems;Bhattacharyya S. S.;IEEE Trans. Circuits Syst.,2000

3. Consortium T. T. 1999. The Trimaran benchmark suite. Available at http://www.trimaran.org/.]] Consortium T. T. 1999. The Trimaran benchmark suite. Available at http://www.trimaran.org/.]]

Cited by 79 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. MinUn: Accurate ML Inference on Microcontrollers;Proceedings of the 24th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems;2023-06-13

2. Optimal Arrangement and Rearrangement of Objects on Shelves to Minimize Robot Retrieval Cost;IEEE Transactions on Automation Science and Engineering;2023

3. Echtzeitfähige Ethernet-Kommunikation in automobilen Multicore-Systemen mit hierarchischem Speicherlayout;Informatik aktuell;2022

4. SPECTRUM;ACM Transactions on Embedded Computing Systems;2020-09-30

5. SoMMA: A software-managed memory architecture for multi-issue processors;Microprocessors and Microsystems;2020-09

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3