1. Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2017. Guided Open Vocabulary Image Captioning with Constrained Beam Search. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Copenhagen, Denmark, 936–945.
2. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization. Association for Computational Linguistics, Ann Arbor, Michigan, 65–72.
3. Fuhai Chen, Rongrong Ji, Jiayi Ji, Xiaoshuai Sun, Baochang Zhang, Ge Xuri, Yongjian Wu, Feiyue Huang, and Yan Wang. 2019. Variational Structured Semantic Inference for Diverse Image Captioning. Neural Information Processing Systems,Neural Information Processing Systems 25 (2019).
4. Qi Chen, Chaorui Deng, and Qi Wu. 2022. Learning Distinct and Representative Modes for Image Captioning. ArXiv abs/2209.08231 (2022).
5. Yangyu Chen, Shuhui Wang, Weigang Zhang, and Qingming Huang. 2018. Less Is More: Picking Informative Frames for Video Captioning. In Computer Vision – ECCV 2018. Springer International Publishing, Cham, 367–384.