1. Bahdanau, D., Cho, K., Bengio, Y., 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
2. Learning long-term dependencies with gradient descent is difficult;Bengio;IEEE transactions on neural networks,1994
3. Gers, F.A., Schmidhuber, J., Cummins, F., 1999. Learning to forget: Continual prediction with lstm.
4. Graves, A., 2013. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850.
5. Lstm can solve hard long time lag problems;Hochreiter;Advances in neural information processing systems,1997