Image captioning using deep learning and python-Reference-Cited by-同舟云学术

Image captioning using deep learning and python

Published:2024 Issue:3 Volume:18 Page:59
ISSN:0973-5151
Container-title:i-manager’s Journal on Software Engineering
language:en
Short-container-title:JSE

Author:

Manish Nishad¹,Lokeshwari Sahu¹,Gunjan Kumar¹,Rahul Verma¹,Shankar Sharan Tripathi¹,Siddhartha Choubey¹,Madhu Yadav¹

Affiliation:

1. Shri Shankaracharya Technical Campus

Abstract

In recent years, the confluence of computer vision and natural language processing, propelled by advancements in deep learning, has garnered significant interest. Among its notable applications, image captioning stands out, enabling computers to comprehend visual content through one or more sentences. This process entails not only identifying objects and scenes but also analyzing their attributes, states, and interrelations, culminating in the generation of meaningful descriptions encapsulating high-level image semantics. While inherently complex, image captioning has seen remarkable progress thanks to the efforts of numerous researchers. This paper offers a comprehensive review of three prominent image captioning methodologies leveraging deep neural networks: CNN-RNN, CNN-CNN, and Reinforcement-based frameworks. Each approach is accompanied by a detailed analysis of representative works, elucidating their respective contributions. Furthermore, evaluation metrics pertinent to these methods are discussed, followed by a synthesis of their advantages and primary challenges. Through this thorough examination, insights into the evolving landscape of image captioning are aimed to be provided, highlighting avenues for further exploration and innovation.

Publisher

i-manager Publications

Reference26 articles.

1. Alzubi, J. A., Jain, R., Nagrath, P., Satapathy, S., Taneja, S., & Gupta, P. (2021). Deep image captioning using an ensemble of CNN and LSTM based deep neural networks. Journal of Intelligent & Fuzzy Systems, 40(4), 5761-5769.

2. SPICE: Semantic Propositional Image Caption Evaluation

3. Aneja, J., Deshpande, A., & Schwing, A. G. (2018). Convolutional image captioning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 5561-5570).

4. Banerjee, S., & Lavie, A. (2005, June). METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization (pp. 65-72).