Long Short-Term Memory-Reference-Cited by-同舟云学术

Long Short-Term Memory

Published:1997-11-01 Issue:8 Volume:9 Page:1735-1780
ISSN:0899-7667
Container-title:Neural Computation
language:en
Short-container-title:Neural Computation

Author:

Hochreiter Sepp¹,Schmidhuber Jürgen²

Affiliation:

1. Fakultät für Informatik, Technische Universität München, 80290 München, Germany

2. IDSIA, Corso Elvezia 36, 6900 Lugano, Switzerland

Abstract

Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

Publisher

MIT Press - Journals

Subject

Cognitive Neuroscience,Arts and Humanities (miscellaneous)

Link

https://www.mitpressjournals.org/doi/pdf/10.1162/neco.1997.9.8.1735

Reference20 articles.

1. Contrastive Learning and Neural Oscillations

2. Learning long-term dependencies with gradient descent is difficult

3. Finite State Automata and Simple Recurrent Networks

4. Adaptive neural oscillator using continuous-time back-propagation learning

5. A time-delay neural network architecture for isolated word recognition

Cited by 46350 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Graph sequence learning for premise selection;Journal of Symbolic Computation;2025-05

2. Similar assembly state discriminator for reinforcement learning-based robotic connector assembly;Robotics and Computer-Integrated Manufacturing;2025-02

3. TFformer: A time–frequency domain bidirectional sequence-level attention based transformer for interpretable long-term sequence forecasting;Pattern Recognition;2025-02

4. Group link prediction in bipartite graphs with graph neural networks;Pattern Recognition;2025-02

5. HierCode: A lightweight hierarchical codebook for zero-shot Chinese text recognition;Pattern Recognition;2025-02