1. Adam: A method for stochastic optimization;kingma;Porceedings of the International Conference for Learning Representations,2015
2. fairseq: A Fast, Extensible Toolkit for Sequence Modeling
3. Analyzing uncertainty in neural machine translation;ott;Proceedings of the International Conference on Machine Learning,2018
4. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context
5. Generating sequences with recurrent neural networks;graves;ArXiv Preprint,2013