1. Learning long-term dependencies with gradient descent is difficult;Bengio;IEEE Trans. Neural Networks,1994
2. A neural probabilistic language model;Bengio;J. Mach. Learn. Res.,2003
3. Learning phrase representations using rnn encoder-decoder for statistical machine translation;Cho;arXiv preprint arXiv:1406.1078,2014
4. J. Chung, C. Gulcehre, K. Cho, Y. Bengio, Gated feedback recurrent neural networks, arXiv preprint arXiv:1502.02367 (2015).
5. Natural language processing (almost) from scratch;Collobert;J. Mach. Learn. Res.,2011