Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices-Reference-Cited by-同舟云学术

Gradient Decomposition Methods for Training Neural Networks With Non-ideal Synaptic Devices

Published:2021-11-22 Issue: Volume:15 Page:
ISSN:1662-453X
Container-title:Frontiers in Neuroscience
language:
Short-container-title:Front. Neurosci.

Author:

Zhao Junyun,Huang Siyuan,Yousuf Osama,Gao Yutong,Hoskins Brian D.,Adam Gina C.

Abstract

While promising for high-capacity machine learning accelerators, memristor devices have non-idealities that prevent software-equivalent accuracies when used for online training. This work uses a combination of Mini-Batch Gradient Descent (MBGD) to average gradients, stochastic rounding to avoid vanishing weight updates, and decomposition methods to keep the memory overhead low during mini-batch training. Since the weight update has to be transferred to the memristor matrices efficiently, we also investigate the impact of reconstructing the gradient matrixes both internally (rank-seq) and externally (rank-sum) to the memristor array. Our results show that streaming batch principal component analysis (streaming batch PCA) and non-negative matrix factorization (NMF) decomposition algorithms can achieve near MBGD accuracy in a memristor-based multi-layer perceptron trained on the MNIST (Modified National Institute of Standards and Technology) database with only 3 to 10 ranks at significant memory savings. Moreover, NMF rank-seq outperforms streaming batch PCA rank-seq at low-ranks making it more suitable for hardware implementation in future memristor-based accelerators.

Funder

Office of Naval Research

George Washington University

National Institute of Standards and Technology

Publisher

Frontiers Media SA

Subject

General Neuroscience

Reference65 articles.

1. Challenges hindering memristive neuromorphic hardware from going mainstream.;Adam;Nat. Commun.,2018

2. Equivalent-accuracy accelerated neural-network training using analogue memory.;Ambrogio;Nature,2018

3. Switching phenomena in titanium oxide thin films.;Argall;Solid State Electron.,1968

4. Highly scalable nonvolatile resistive memory using simple binary oxide driven by asymmetric unipolar voltage pulses;Baek;Proceedings of the IEDM Technical Digest. IEEE International Electron Devices Meeting, 2004,2004

5. An electronic digital computor using cold cathode counting tubes for storage.;Barnes;Electron. Eng.,1951

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Neural Network Modeling Bias for Hafnia-based FeFETs;Proceedings of the 18th ACM International Symposium on Nanoscale Architectures;2023-12-18

2. Device Modeling Bias in ReRAM-Based Neural Network Simulations;IEEE Journal on Emerging and Selected Topics in Circuits and Systems;2023-03