Overflow-free Compute Memories for Edge AI Acceleration-Reference-Cited by-同舟云学术

Overflow-free Compute Memories for Edge AI Acceleration

Published:2023-09-09 Issue:5s Volume:22 Page:1-23
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Ponzina Flavio¹^ORCID,Rios Marco¹^ORCID,Levisse Alexandre¹^ORCID,Ansaloni Giovanni¹^ORCID,Atienza David¹^ORCID

Affiliation:

1. École Polytechnique Fédérale de Lausanne (EPFL), Embedded Systems Laboratory, Switzerland

Abstract

Compute memories are memory arrays augmented with dedicated logic to support arithmetic. They support the efficient execution of data-centric computing patterns, such as those characterizing Artificial Intelligence (AI) algorithms. These architectures can provide computing capabilities as part of the memory array structures (In-Memory Computing, IMC) or at their immediate periphery (Near-Memory Computing, NMC). By bringing the processing elements inside (or very close to) storage, compute memories minimize the cost of data access. Moreover, highly parallel (and, hence, high-performance) computations are enabled by exploiting the regular structure of memory arrays. However, the regular layout of memory elements also constrains the data range of inputs and outputs, since the bitwidths of operands and results stored at each address cannot be freely varied. Addressing this challenge, we herein propose a HW/SW co-design methodology combining careful per-layer quantization and inter-layer scaling with lightweight hardware support for overflow-free computation of dot-vector operations. We demonstrate their use to implement the convolutional and fully connected layers of AI models. We embody our strategy in two implementations, based on IMC and NMC, respectively. Experimental results highlight that an area overhead of only 10.5% (for IMC) and 12.9% (for NMC) is required when interfacing with a 2KB subarray. Furthermore, inferences on benchmark CNNs show negligible accuracy degradation due to quantization for equivalent floating-point implementations.

Funder

EC H2020 WiPLASH

EC H2020 FVLLMONTI

ACCESS – AI Chip Center for Emerging Smart Systems

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3609387

Reference44 articles.

1. Stephan Patrick Baller, Anshul Jindal, Mohak Chadha, and Michael Gerndt. 2021. DeepEdgeBench: Benchmarking deep neural networks on edge devices. In 2021 IEEE International Conference on Cloud Engineering (IC2E’21). IEEE, 20–30.

2. Neuromorphic computing using non-volatile memory

3. An efficient CNN accelerator for low-cost edge systems;Choi Kyubaik;ACM Trans. Embed. Comput. Syst.,2022

4. Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors;Jr Claudionor N. Coelho;Nature Machine Intelligence,2021

5. Amin Farmahini-Farahani, Jung Ho Ahn, Katherine Morrow, and Nam Sung Kim. 2015. NDA: Near-DRAM acceleration architecture leveraging commodity DRAM devices and standard memory modules. In 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). 283–295.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An Energy Efficient Soft SIMD Microarchitecture and Its Application on Quantized CNNs;IEEE Transactions on Very Large Scale Integration (VLSI) Systems;2024-06

2. Approximate Fault-Tolerant Neural Network Systems;2024 IEEE European Test Symposium (ETS);2024-05-20