Affiliation:
1. Shri Shankaracharya Technical Campus
Abstract
In recent years, the confluence of computer vision and natural language processing, propelled by advancements in deep learning, has garnered significant interest. Among its notable applications, image captioning stands out, enabling computers to comprehend visual content through one or more sentences. This process entails not only identifying objects and scenes but also analyzing their attributes, states, and interrelations, culminating in the generation of meaningful descriptions encapsulating high-level image semantics. While inherently complex, image captioning has seen remarkable progress thanks to the efforts of numerous researchers. This paper offers a comprehensive review of three prominent image captioning methodologies leveraging deep neural networks: CNN-RNN, CNN-CNN, and Reinforcement-based frameworks. Each approach is accompanied by a detailed analysis of representative works, elucidating their respective contributions. Furthermore, evaluation metrics pertinent to these methods are discussed, followed by a synthesis of their advantages and primary challenges. Through this thorough examination, insights into the evolving landscape of image captioning are aimed to be provided, highlighting avenues for further exploration and innovation.
Reference26 articles.
1. Alzubi, J. A., Jain, R., Nagrath, P., Satapathy, S.,
Taneja, S., & Gupta, P. (2021). Deep image captioning
using an ensemble of CNN and LSTM based deep neural
networks. Journal of Intelligent & Fuzzy Systems, 40(4),
5761-5769.
2. SPICE: Semantic Propositional Image Caption Evaluation
3. Aneja, J., Deshpande, A., & Schwing, A. G. (2018).
Convolutional image captioning. In Proceedings of the
IEEE Conference on Computer Vision and Pattern
Recognition (pp. 5561-5570).
4. Banerjee, S., & Lavie, A. (2005, June). METEOR: An
automatic metric for MT evaluation with improved
correlation with human judgments. In Proceedings of the
ACL Workshop on Intrinsic and Extrinsic Evaluation
Measures for Machine Translation and/or Summarization
(pp. 65-72).