1. Adamčík M. The information geometry of bregman divergences and some applications in multi-expert reasoning. Entropy. 2014;6338–6381(12):2014.
2. Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate. In: International conference on learning representations (ICLR) 2015 as oral presentation. 2014. (accepted)
3. Bai S, Zico KJ, Koltun V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv e-prints; 2018.
4. Barakat A, Bianchi P. Convergence and dynamical behavior of the adam algorithm for non convex stochastic optimization. In: arXiv/statML. 2018.
5. Cho K, Merrienboer B van, Gulçehre C, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.