Generating Image Captions Using Bahdanau Attention Mechanism and Transfer Learning-Reference-Cited by-同舟云学术

Generating Image Captions Using Bahdanau Attention Mechanism and Transfer Learning

Published:2022-12-18 Issue:12 Volume:14 Page:2681
ISSN:2073-8994
Container-title:Symmetry
language:en
Short-container-title:Symmetry

Author:

Ayoub Shahnawaz^ORCID,Gulzar Yonis^ORCID,Reegu Faheem Ahmad^ORCID,Turaev Sherzod^ORCID

Abstract

Automatic image caption prediction is a challenging task in natural language processing. Most of the researchers have used the convolutional neural network as an encoder and decoder. However, an accurate image caption prediction requires a model to understand the semantic relationship that exists between the various objects present in an image. The attention mechanism performs a linear combination of encoder and decoder states. It emphasizes the semantic information present in the caption with the visual information present in an image. In this paper, we incorporated the Bahdanau attention mechanism with two pre-trained convolutional neural networks—Vector Geometry Group and InceptionV3—to predict the captions of a given image. The two pre-trained models are used as encoders and the Recurrent neural network is used as a decoder. With the help of the attention mechanism, the two encoders are able to provide semantic context information to the decoder and achieve a bilingual evaluation understudy score of 62.5. Our main goal is to compare the performance of the two pre-trained models incorporated with the Bahdanau attention mechanism on the same dataset.

Funder

United Arab Emirates University

Publisher

MDPI AG

Subject

Physics and Astronomy (miscellaneous),General Mathematics,Chemistry (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2073-8994/14/12/2681/pdf

Reference46 articles.

1. Wang, P., Yang, A., Men, R., Lin, J., Bai, S., Li, Z., Ma, J., Zhou, C., Zhou, J., and Yang, H. (2022). OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework. arXiv, Available online: https://arxiv.org/abs/2202.03052.

2. Hsu, T.Y., Giles, C.L., and Huang, T.H. (2021). Findings of the Association for Computational Linguistics, Findings of ACL: EMNLP 2021, Association for Computational Linguistics.

3. Text to Image Synthesis for Improved Image Captioning;Hossain;IEEE Access,2021

4. Sehgal, S., Sharma, J., and Chaudhary, N. (2020, January 4–5). Generating Image Captions Based on Deep Learning and Natural Language Processing. Proceedings of the ICRITO 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) IEEE, Noida, India.

5. Jain, H., Zepeda, J., Perez, P., and Gribonval, R. (2018, January 18–23). Learning a Complete Image Indexing Pipeline. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.

Cited by 30 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A systematic review of deep learning applications for rice disease diagnosis: current trends and future directions;Frontiers in Computer Science;2024-09-11

2. GAN-enhanced E-nose analysis: VTAAE for temporal dynamics in beef quality assessment;Evolving Systems;2024-09-02

3. State-of-charge estimation hybrid method for lithium-ion batteries using BiGRU and AM co-modified Seq2Seq network and H-infinity filter;Energy;2024-08

4. A Symmetric Efficient Spatial and Channel Attention (ESCA) Module Based on Convolutional Neural Networks;Symmetry;2024-07-25

5. An evaluation of intelligent and immersive digital applications in eliciting cognitive states in humans through the utilization of Emotiv Insight;MethodsX;2024-06