Dynamic allocation for scratch-pad memory using compile-time decisions-Reference-Cited by-同舟云学术

Dynamic allocation for scratch-pad memory using compile-time decisions

Published:2006-05 Issue:2 Volume:5 Page:472-511
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Udayakumaran Sumesh¹,Dominguez Angel¹,Barua Rajeev¹

Affiliation:

1. University of Maryland, College Park, MD

Abstract

In this research, we propose a highly predictable, low overhead, and, yet, dynamic, memory-allocation strategy for embedded systems with scratch pad memory. A scratch pad is a fast compiler-managed SRAM memory that replaces the hardware-managed cache. It is motivated by its better real-time guarantees versus cache and by its significantly lower overheads in energy consumption, area, and overall runtime, even with a simple allocation scheme. Primarily scratch pad allocation methods are of two types. First, software-caching schemes emulate the workings of a hardware cache in software. Instructions are inserted before each load/store to check the software-maintained cache tags. Such methods incur large overheads in runtime, code size, energy consumption, and SRAM space for tags and deliver poor real-time guarantees just like hardware caches. A second category of algorithms partitions variables at compile-time into the two banks. However, a drawback of such static allocation schemes is that they do not account for dynamic program behavior. It is easy to see why a data allocation that never changes at runtime cannot achieve the full locality benefits of a cache. We propose a dynamic allocation methodology for global and stack data and program code that; (i) accounts for changing program requirements at runtime, (ii) has no software-caching tags, (iii) requires no runtime checks, (iv) has extremely low overheads, and (v) yields 100% predictable memory access times. In this method, data that is about to be accessed frequently is copied into the scratch pad using compiler-inserted code at fixed and infrequent points in the program. Earlier data is evicted if necessary. When compared to a provably optimal static allocation, results show that our scheme reduces runtime by up to 39.8% and energy by up to 31.3%, on average, for our benchmarks, depending on the SRAM size used. The actual gain depends on the SRAM size, but our results show that close to the maximum benefit in runtime and energy is achieved for a substantial range of small SRAM sizes commonly found in embedded systems. Our comparison with a direct mapped cache shows that our method performs roughly as well as a cached architecture.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/1151074.1151085

Reference48 articles.

1. Appel A. W. and Ginsburg M. 1998. . . . C. Cambridge University Press Cambridge.]] Appel A. W. and Ginsburg M. 1998. Modern Compiler Implementation in C. Cambridge University Press Cambridge.]]

Cited by 69 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Compiler-Based Memory Encryption for Machine Learning on Commodity Low-Power Devices;Proceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction;2024-02-17

2. Mapi-Pro: An Energy Efficient Memory Mapping Technique for Intermittent Computing;ACM Transactions on Architecture and Code Optimization;2023-12-14

3. GNN at the Edge: Cost-Efficient Graph Neural Network Processing Over Distributed Edge Servers;IEEE Journal on Selected Areas in Communications;2023-03

4. Pin or Fuse? Exploiting Scratchpad Memory to Reduce Off-Chip Data Transfer in DNN Accelerators;Proceedings of the 21st ACM/IEEE International Symposium on Code Generation and Optimization;2023-02-17

5. Optimizing data placement and size configuration for morphable NVM based SPM in embedded multicore systems;Future Generation Computer Systems;2022-10