CLIP-Prefix for Image Captioning and an Experiment on Blind Image Guessing-Reference-Cited by-同舟云学术

CLIP-Prefix for Image Captioning and an Experiment on Blind Image Guessing

Published:2024 Issue: Volume: Page:189-203
ISSN:1867-8211
Container-title:Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
language:en
Short-container-title:

Author:

Huynh Triet Minh^ORCID,Nguyen Duy Linh^ORCID,Nguyen Thanh Tri^ORCID,Vu Thuy-Duong Thi^ORCID,Dang-Ngoc Hanh^ORCID,Dang Duc Ngoc Minh^ORCID

Publisher

Springer Nature Switzerland

Link

https://link.springer.com/content/pdf/10.1007/978-3-031-67357-3_14

Reference24 articles.

1. Radford, A., et al.: Learning transferable visual models from natural language supervision. In: International Conference on Machine Learning, pp. 8748–8763. PMLR (2021)

2. Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3128–3137 (2015)

3. Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans. Pattern Anal. Mach. Intell. 39(4), 652–663 (2017). https://doi.org/10.1109/TPAMI.2016.2587640

4. Tanti, M., Gatt, A., Camilleri, K.: What is the role of recurrent neural networks (RNNs) in an image caption generator? In: Proceedings of the 10th International Conference on Natural Language Generation, pp. 51–60. Association for Computational Linguistics, September 2017. https://aclanthology.org/W17-3506

5. Anderson, P., et al.: Bottom-up and top-down attention for image captioning and visual question answering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6077–6086 (2018)