Funder
Natural Science Foundation of Guangxi Province
National Natural Science Foundation of China
Reference60 articles.
1. Spice: Semantic propositional image caption evaluation;Anderson,2016
2. Anderson, P., He, X., Buehler, C., Teney, D., Johnson, M., Gould, S., et al. (2018). Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 6077–6086).
3. Banerjee, S., & Lavie, A. (2005). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization (pp. 65–72).
4. Event-centric multi-modal fusion method for dense video captioning;Chang;Neural Networks,2022
5. Chen, J., Guo, H., Yi, K., Li, B., & Elhoseiny, M. (2022). Visualgpt: Data-efficient adaptation of pretrained language models for image captioning. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 18030–18040).