ReHarvest: An ADC Resource-Harvesting Crossbar Architecture for ReRAM-Based DNN Accelerators-Reference-Cited by-同舟云学术

ReHarvest: An ADC Resource-Harvesting Crossbar Architecture for ReRAM-Based DNN Accelerators

Published:2024-09-14 Issue:3 Volume:21 Page:1-26
ISSN:1544-3566
Container-title:ACM Transactions on Architecture and Code Optimization
language:en
Short-container-title:ACM Trans. Archit. Code Optim.

Author:

Xu Jiahong¹^ORCID,Liu Haikun¹^ORCID,Duan Zhuohui¹^ORCID,Liao Xiaofei¹^ORCID,Jin Hai¹^ORCID,Yang Xiaokang¹^ORCID,Li Huize¹^ORCID,Liu Cong¹^ORCID,Mao Fubing¹^ORCID,Zhang Yu¹^ORCID

Affiliation:

1. Huazhong University of Science and Technology, Wuhan, China

Abstract

ReRAM-based Processing-In-Memory (PIM) architectures have been increasingly explored to accelerate various Deep Neural Network (DNN) applications because they can achieve extremely high performance and energy-efficiency for in-situ analog Matrix-Vector Multiplication (MVM) operations. However, since ReRAM crossbar arrays’ peripheral circuits– analog-to-digital converters (ADCs) often feature high latency and low area efficiency, AD conversion has become a performance bottleneck of in-situ analog MVMs. Moreover, since each crossbar array is tightly coupled with very limited ADCs in current ReRAM-based PIM architectures, the scarce ADC resource is often underutilized. In this article, we propose ReHarvest, an ADC-crossbar decoupled architecture to improve the utilization of ADC resource. Particularly, we design a many-to-many mapping structure between crossbars and ADCs to share all ADCs in a tile as a resource pool, and thus one crossbar array can harvest much more ADCs to parallelize the AD conversion for each MVM operation. Moreover, we propose a multi-tile matrix mapping (MTMM) scheme to further improve the ADC utilization across multiple tiles by enhancing data parallelism. To support fine-grained data dispatching for the MTMM, we also design a bus-based interconnection network to multicast input vectors among multiple tiles, and thus eliminate data redundancy and potential network congestion during multicasting. Extensive experimental results show that ReHarvest can improve the ADC utilization by 3.2×, and achieve 3.5× performance speedup while reducing the ReRAM resource consumption by 3.1× on average compared with the state-of-the-art PIM architecture–FORMS.

Funder

National Key Research and Development Program of China

National Natural Science Foundation of China

Natural Science Foundation of Hubei Province

Huawei

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3659208

Reference73 articles.

1. Krizhevsky Alex. 2009. CIFAR-10 and CIFAR-100 Datasets. (2009). Retrieved January 5 2024 from https://www.cs.toronto.edu/kriz/cifar.html

2. Speed and power scaling of SRAM's

3. Tanner Andrulis, Joel S. Emer, and Vivienne Sze. 2023. RAELLA: Reforming the arithmetic for efficient, low-resolution, and low-loss analog PIM: No retraining required!. In Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA). Article 27, 16 pages.

4. Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R. Stanley Williams, Paolo Faraboschi, Wen-mei W. Hwu, John Paul Strachan, Kaushik Roy, and Dejan S. Milojicic. 2019. PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 715–731.

5. Quarry: Quantization-based ADC Reduction for ReRAM-based Deep Neural Network Accelerators