1. Aafaq N, Akhtar N, Liu W, Gilani SZ, Mian A (2019) Spatio-temporal dynamics and semantic attribute enriched visual encoding for video captioning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition pp. 12487-12496
2. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473
3. Banerjee S, Lavie A (2005) Meteor: an automatic metric format evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, volume 29, p 65–72
4. Baraldi L, Grana C, Cucchiara R (2017) Hierarchical boundary-aware neural encoder for video captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3185–3194, https://doi.org/10.1109/CVPR.2017.339
5. Barbu A, Bridge A, Burchill Z, Corian D, Dickinson S, Fidler S, Michaux A, Mussman S, Narayanaswamy S, Salvi D, et al. (2012) Video in sentences out. arXiv preprint arXiv:1204.2742,(2012)