Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators-Reference-Cited by-同舟云学术

Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators

Published:2023-08-30 Issue:1 Volume:14 Page:
ISSN:2041-1723
Container-title:Nature Communications
language:en
Short-container-title:Nat Commun

Author:

Rasch Malte J.^ORCID,Mackin Charles^ORCID,Le Gallo Manuel^ORCID,Chen An^ORCID,Fasoli Andrea^ORCID,Odermatt Frédéric^ORCID,Li Ning^ORCID,Nandakumar S. R.,Narayanan Pritish,Tsai Hsinyu^ORCID,Burr Geoffrey W.^ORCID,Sebastian Abu^ORCID,Narayanan Vijay^ORCID

Abstract

AbstractAnalog in-memory computing—a promising approach for energy-efficient acceleration of deep learning workloads—computes matrix-vector multiplications but only approximately, due to nonidealities that often are non-deterministic or nonlinear. This can adversely impact the achievable inference accuracy. Here, we develop an hardware-aware retraining approach to systematically examine the accuracy of analog in-memory computing across multiple network topologies, and investigate sensitivity and robustness to a broad set of nonidealities. By introducing a realistic crossbar model, we improve significantly on earlier retraining approaches. We show that many larger-scale deep neural networks—including convnets, recurrent networks, and transformers—can in fact be successfully retrained to show iso-accuracy with the floating point implementation. Our results further suggest that nonidealities that add noise to the inputs or outputs, not the weights, have the largest impact on accuracy, and that recurrent networks are particularly robust to all nonidealities.

Publisher

Springer Science and Business Media LLC

Subject

General Physics and Astronomy,General Biochemistry, Genetics and Molecular Biology,General Chemistry,Multidisciplinary

Link

https://www.nature.com/articles/s41467-023-40770-4.pdf

Reference88 articles.

1. Sevilla, J. et al. Compute trends across three eras of machine learning. Preprint at https://arxiv.org/abs/2202.05924 (2022).

2. Sze, V., Chen, Y. H., Yang, T. J. & Emer, J. S. Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105, 2295–2329 (2017).

3. Jia, H., Valavi, H., Tang, Y., Zhang, J. & Verma, N. A programmable heterogeneous microprocessor based on bit-scalable in-memory computing. IEEE J. Solid State Circ. 55, 2609–2621 (2020).

4. Reuther, A. et al. Ai accelerator survey and trends. in 2021 IEEE High Performance Extreme Computing Conference (HPEC) 1–9 (IEEE, 2021).

5. Wang, S. & Kanwar, P. BFloat16: the secret to high performance on Cloud TPUs. Google Cloud Blog 4, (2019).

Cited by 30 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Improving model robustness to weight noise via consistency regularization;Machine Learning: Science and Technology;2024-09-01

2. Fast and robust analog in-memory deep neural network training;Nature Communications;2024-08-20

3. Energy-Efficient Neural Network Acceleration Using Most Significant Bit-Guided Approximate Multiplier;Electronics;2024-08-01

4. Difficulties and approaches in enabling learning-in-memory using crossbar arrays of memristors;Neuromorphic Computing and Engineering;2024-08-01

5. Shrinking the giants: Paving the way for TinyAI;Device;2024-08