Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks-Reference-Cited by-同舟云学术

Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks

Published:2005-12-01 Issue:12 Volume:17 Page:2699-2718
ISSN:0899-7667
Container-title:Neural Computation
language:en
Short-container-title:Neural Computation

Author:

Werfel Justin¹,Xie Xiaohui²,Seung H. Sebastian³

Affiliation:

1. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.,

2. Broad Institute of Massachusetts Institute of Technology and Harvard University, Cambridge, MA 02141, U.S.A.,

3. Howard Hughes Medical Institute, Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.,

Abstract

Gradient-following learning methods can encounter problems of implementation in many applications, and stochastic variants are sometimes used to overcome these difficulties. We analyze three online training methods used with a linear perceptron: direct gradient descent, node perturbation, and weight perturbation. Learning speed is defined as the rate of exponential decay in the learning curves. When the scalar parameter that controls the size of weight updates is chosen to maximize learning speed, node perturbation is slower than direct gradient descent by a factor equal to the number of output units; weight perturbation is slower still by an additional factor equal to the number of input units. Parallel perturbation allows faster learning than sequential perturbation, by a factor that does not depend on network size. We also characterize how uncertainty in quantities used in the stochastic updates affects the learning curves. This study suggests that in practice, weight perturbation may be slow for large networks, and node perturbation can have performance comparable to that of direct gradient descent when there are few output units. However, these statements depend on the specifics of the learning problem, such as the input distribution and the target function, and are not universally applicable.

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/089976605774320539

Reference8 articles.

1. Learning in linear neural networks: a survey

2. On-Line Learning with a Perceptron

3. An analog VLSI recurrent neural network learning a continuous-time trajectory

4. Learning processes in neural networks

5. Weight perturbation: an optimal architecture and learning technique for analog VLSI feedforward and recurrent multilayer networks

Cited by 59 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Weight Perturbation and Competitive Hebbian Plasticity for Training Sparse Excitatory Neural Networks;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

2. Chaotic neural dynamics facilitate probabilistic computations through sampling;Proceedings of the National Academy of Sciences;2024-04-22

3. Theoretical limits on the speed of learning inverse models explain the rate of adaptation in arm reaching tasks;Neural Networks;2024-02

4. Training Spiking Neural Networks Using Lessons From Deep Learning;Proceedings of the IEEE;2023-09

5. Multiplexed gradient descent: Fast online training of modern datasets on hardware neural networks without backpropagation;APL Machine Learning;2023-06-01