1. Ba J, Mnih V, Kavukcuoglu K (2015) Multiple object recognition with visual attention. ICLR, arXiv:1412.7755
2. Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations (ICLR), arXiv:1409.0473
3. Banerjee S, Lavie A (June 2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. https://www.aclweb.org/anthology/W05-0909. Association for Computational Linguistics, Ann Arbor, pp 65–72
4. Bello I, Zoph B, Vaswani A, Shlens J, Le QV (2019) Attention augmented convolutional networks. In: Proceedings of the IEEE international conference on computer vision, pp 3286–3295
5. Chen S, Jiang Y-G (2019) Motion guided spatial attention for video captioning. In: Proceedings of the AAAI conference on artificial intelligence, vol 33(01), pp 8191–8198