CRIMP: C ompact & R eliable DNN Inference on I n- M emory P rocessing via Crossbar-Aligned Compression and Non-ideality Adaptation-Reference-Cited by-同舟云学术

CRIMP: C ompact & R eliable DNN Inference on I n- M emory P rocessing via Crossbar-Aligned Compression and Non-ideality Adaptation

Published:2023-09-09 Issue:5s Volume:22 Page:1-25
ISSN:1539-9087
Container-title:ACM Transactions on Embedded Computing Systems
language:en
Short-container-title:ACM Trans. Embed. Comput. Syst.

Author:

Huai Shuo¹^ORCID,Kong Hao¹^ORCID,Luo Xiangzhong¹^ORCID,Li Shiqing¹^ORCID,Subramaniam Ravi²^ORCID,Makaya Christian²^ORCID,Lin Qian²^ORCID,Liu Weichen¹^ORCID

Affiliation:

1. Nanyang Technological University, Singapore

2. HP Inc., United States

Abstract

Crossbar-based In-Memory Processing (IMP) accelerators have been widely adopted to achieve high-speed and low-power computing, especially for deep neural network (DNN) models with numerous weights and high computational complexity. However, the floating-point (FP) arithmetic is not compatible with crossbar architectures. Also, redundant weights of current DNN models occupy too many crossbars, limiting the efficiency of crossbar accelerators. Meanwhile, due to the inherent non-ideal behavior of crossbar devices, like write variations, pre-trained DNN models suffer from accuracy degradation when it is deployed on a crossbar-based IMP accelerator for inference. Although some approaches are proposed to address these issues, they often fail to consider the interaction among these issues, and introduce significant hardware overhead for solving each issue. To deploy complex models on IMP accelerators, we should compact the model and mitigate the influence of device non-ideal behaviors without introducing significant overhead from each technique. In this paper, we first propose to reuse bit-shift units in crossbars for approximately multiplying scaling factors in our quantization scheme to avoid using FP processors. Second, we propose to apply kernel-group pruning and crossbar pruning to eliminate the hardware units for data aligning. We also design a zerorize-recover training process for our pruning method to achieve higher accuracy. Third, we adopt the runtime-aware non-ideality adaptation with a self-compensation scheme to relieve the impact of non-ideality by exploiting the feature of crossbars. Finally, we integrate these three optimization procedures into one training process to form a comprehensive learning framework for co-optimization, which can achieve higher accuracy. The experimental results indicate that our comprehensive learning framework can obtain significant improvements over the original model when inferring on the crossbar-based IMP accelerator, with an average reduction of computing power and computing area by 100.02× and 17.37×, respectively. Furthermore, we can obtain totally integer-only, pruned, and reliable VGG-16 and ResNet-56 models for the Cifar-10 dataset on IMP accelerators, with accuracy drops of only 2.19% and 1.26%, respectively, without any hardware overhead.

Funder

RIE2020 Industry Alignment Fund – Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner, HP Inc.

Nanyang Technological University, Singapore, under its NAP

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Software

Link

https://dl.acm.org/doi/pdf/10.1145/3609115

Reference46 articles.

1. Performance refinement of convolutional neural network architectures for solving big data problems;Aljaloud Saud;Tikrit Journal of Pure Science,2023

2. Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra, Jorge Luis Reyes-Ortiz, et al. 2013. A public domain dataset for human activity recognition using smartphones.. In Esann, Vol. 3. 3.

3. Accurate Inference With Inaccurate RRAM Devices: A Joint Algorithm-Design Solution

4. RRAM Defect Modeling and Failure Analysis Based on March Test and a Novel Squeeze-Search Scheme

5. NeuroSim: A Circuit-Level Macro Model for Benchmarking Neuro-Inspired Architectures in Online Learning