Hardware-aware Quantization/Mapping Strategies for Compute-in-Memory Accelerators-Reference-Cited by-同舟云学术

Hardware-aware Quantization/Mapping Strategies for Compute-in-Memory Accelerators

Published:2023-03-19 Issue:3 Volume:28 Page:1-23
ISSN:1084-4309
Container-title:ACM Transactions on Design Automation of Electronic Systems
language:en
Short-container-title:ACM Trans. Des. Autom. Electron. Syst.

Author:

Huang Shanshi¹^ORCID,Jiang Hongwu¹^ORCID,Yu Shimeng¹^ORCID

Affiliation:

1. Georgia Institute of Technology, Atlanta, GA, US

Abstract

The emerging non-volatile memory (eNVM) based mixed-signal Compute-in-Memory (CIM) accelerators are of great interest in today's AI accelerators design due to their high energy efficiency. Various CIM architectures and circuit-level designs have been proposed, showing superior hardware performance for deep neural network (DNN) acceleration. However, hardware-aware quantization strategies for CIM-based accelerators are not systematically explored. Since there are a variety of design options for neural network mapping on CIM systems while improper strategies may narrow the circuit-level design space and further limit the hardware performance, it is important to make a comprehensive early-stage design space exploration and find appropriate quantization/mapping strategies to achieve better hardware performance. In this paper, we provide a joint algorithm-hardware analysis and compare the system-level hardware performance for various design options, including quantization algorithms, data representation methods and analog-to-digital converter (ADC) configurations. This work aims to propose guidelines for choosing more hardware-friendly design options for chip architects. According to our evaluation results for CIFAR-10/100 and ImageNet classification, the properly chosen quantization approach and optimal mapping strategy (dynamic fixed-point quantization + 2’s complement representation/shifted unsigned INT representation + optimized precision ADC) could achieve ∼2 × energy efficiency and 1.2 ∼ 1.6 × throughput with 5%∼25% reduced area overhead, compared to naïve strategy (fixed-point quantization + differential pair number representation + full precision ADC).

Funder

ASCENT, one of the SRC/DARPA JUMP Centers

Publisher

Association for Computing Machinery (ACM)

Subject

Electrical and Electronic Engineering,Computer Graphics and Computer-Aided Design,Computer Science Applications

Link

https://dl.acm.org/doi/pdf/10.1145/3569940

Reference37 articles.

1. Neuro-Inspired Computing With Emerging Nonvolatile Memorys

2. A methodology to improve linearity of analog RRAM for neuromorphic computing;Wu W.;IEEE Symposium on VLSI Technology (VLSI’18),2018

3. Ferroelectric FET analog synapse for acceleration of deep neural network training;Jerry M.;IEEE International Electron Devices Meeting (IEDM’17),2017

4. ISAAC: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars;Shafiee Ali;ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA),2016

5. PRIME: A novel processing-in-memory architecture for neural network;Chi Ping;International Symposium on Computer Architecture.,2016

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Difficulties and approaches in enabling learning-in-memory using crossbar arrays of memristors;Neuromorphic Computing and Engineering;2024-08-01

2. Hardware-Aware Quantization and Performance Evaluation for Tensor Accelerator VTA;2023 China Automation Congress (CAC);2023-11-17