1. Exploring the limits of language modeling;Jozefowicz,2016
2. Improved backing-off for m-gram language modeling;Kneser,1995
3. Efficient estimation of word representations in vector space;Mikolov,2013
4. Sequence to sequence learning with neural networks;Sutskever,2014
5. Neural machine translation by jointly learning to align and translate;Bahdanau,2016