1. Neural machine translation by jointly learning to align and translate;Bahdanau;arXiv:1409.0473,2014
2. Learning long-term dependencies with gradient descent is difficult;Bengio;IEEE Trans. Neural Netw.,1994
3. Semi-Supervised Learning;Chapelle,2006
4. An empirical study of smoothing techniques for language modeling;Chen,1996
5. Learning phrase representations using RNN encoder-decoder for statistical machine translation;Cho;arXiv:1406.1078,2014