ReHarvest: An ADC Resource-Harvesting Crossbar Architecture for ReRAM-Based DNN Accelerators

Author:

Xu Jiahong1ORCID,Liu Haikun1ORCID,Duan Zhuohui1ORCID,Liao Xiaofei1ORCID,Jin Hai1ORCID,Yang Xiaokang1ORCID,Li Huize1ORCID,Liu Cong1ORCID,Mao Fubing1ORCID,Zhang Yu1ORCID

Affiliation:

1. Huazhong University of Science and Technology, Wuhan, China

Abstract

ReRAM-based Processing-In-Memory (PIM) architectures have been increasingly explored to accelerate various Deep Neural Network (DNN) applications because they can achieve extremely high performance and energy-efficiency for in-situ analog Matrix-Vector Multiplication (MVM) operations. However, since ReRAM crossbar arrays’ peripheral circuits– analog-to-digital converters (ADCs) often feature high latency and low area efficiency, AD conversion has become a performance bottleneck of in-situ analog MVMs. Moreover, since each crossbar array is tightly coupled with very limited ADCs in current ReRAM-based PIM architectures, the scarce ADC resource is often underutilized. In this article, we propose ReHarvest, an ADC-crossbar decoupled architecture to improve the utilization of ADC resource. Particularly, we design a many-to-many mapping structure between crossbars and ADCs to share all ADCs in a tile as a resource pool, and thus one crossbar array can harvest much more ADCs to parallelize the AD conversion for each MVM operation. Moreover, we propose a multi-tile matrix mapping (MTMM) scheme to further improve the ADC utilization across multiple tiles by enhancing data parallelism. To support fine-grained data dispatching for the MTMM, we also design a bus-based interconnection network to multicast input vectors among multiple tiles, and thus eliminate data redundancy and potential network congestion during multicasting. Extensive experimental results show that ReHarvest can improve the ADC utilization by 3.2×, and achieve 3.5× performance speedup while reducing the ReRAM resource consumption by 3.1× on average compared with the state-of-the-art PIM architecture–FORMS.

Funder

National Key Research and Development Program of China

National Natural Science Foundation of China

Natural Science Foundation of Hubei Province

Huawei

Publisher

Association for Computing Machinery (ACM)

Reference73 articles.

1. Krizhevsky Alex. 2009. CIFAR-10 and CIFAR-100 Datasets. (2009). Retrieved January 5 2024 from https://www.cs.toronto.edu/kriz/cifar.html

2. Speed and power scaling of SRAM's

3. Tanner Andrulis, Joel S. Emer, and Vivienne Sze. 2023. RAELLA: Reforming the arithmetic for efficient, low-resolution, and low-loss analog PIM: No retraining required!. In Proceedings of the 50th Annual International Symposium on Computer Architecture (ISCA). Article 27, 16 pages.

4. Aayush Ankit, Izzat El Hajj, Sai Rahul Chalamalasetti, Geoffrey Ndu, Martin Foltin, R. Stanley Williams, Paolo Faraboschi, Wen-mei W. Hwu, John Paul Strachan, Kaushik Roy, and Dejan S. Milojicic. 2019. PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference. In Proceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). 715–731.

5. Quarry: Quantization-based ADC Reduction for ReRAM-based Deep Neural Network Accelerators

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3