Abstract
AbstractThis paper proposes a novel algorithm, called CAPTION, for identifying and correcting errors in automatically generated image captions. The algorithm combines Deep Learning (DL) for object detection in images with Natural Language Processing techniques. CAPTION has been tested in the following three tasks: (1) classify a caption as correct or not; (2) detect wrong words in the caption, and (3) suggest text corrections. Results show that our method is superior with respect to others evaluated in the same data set in the error correction task. These other methods are generally based exclusively on DL models. This work shows that, although semantics still has not been used at its fullest in this type of task, a combination of DL with Natural Language Processing tools presents a better overall performance than using DL methods alone.
Publisher
Springer Science and Business Media LLC
Reference48 articles.
1. Shen J, Robertson N. Bbas: towards large scale effective ensemble adversarial attacks against deep neural network learning. Inf Sci. 2021;569:469–78.
2. Shekhar R, Pezzelle S, Klimovich Y, Herbelot A, Nabi M, Sangineto E, Bernardi R. FOIL it! find one mismatch between image and language caption. In: Proceedings of the 55th annual meeting of the association for computational linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Vancouver, Canada, p. 255–65 (2017). https://www.aclweb.org/anthology/P17-1024.
3. Liu F, Ye R, Wang X, Li S. HAL: improved text-image matching by mitigating visual semantic hubs. Proc AAAI Conf Artif Intell. 2020;34(07):11563–71.
4. Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. Microsoft COCO: common objects in context. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T, editors. Computer vision—ECCV 2014. Lecture notes in computer science. Cham: Springer International Publishing; 2014. p. 740–55.
5. Antol, S, Agrawal, A, Lu, J, Mitchell, M, Batra, D, Zitnick, CL, Parikh, D. VQA: Visual question answering. In: Proceedings of the IEEE international conference on computer vision; 2015. p. 2425–33.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献