Affiliation:
1. M.H. Saboo Siddik College of Engineering, Byculla, Mumbai, India
Abstract
Image Caption Generation has always been a study of great interest to the researchers in the Artificial Intelligence department. Being able to program a machine to accurately describe an image or an environment like an average human has major applications in the field of robotic vision, business and many more. Automatic caption generation with attention mechanisms aims at generating more descriptive captions containing coarse to fine semantic contents in the image. This has been a challenging task in the field of artificial intelligence. In this paper, we present different image caption generating models based on deep neural networks, focusing on the various CNN techniques and analyzing their influence on the sentence generation. We have also generated captions for sample images and compared the different feature extraction and encoder models to analyse which model gives better accuracy and generates the desired results
Reference14 articles.
1. [1] CS771 Project Image Captioning by Ankit Gupta , Kartik Hira, Bajaj Dilip.
2. [2] ”Every Picture Tells a Story: Generating Sentences from Images.” Computer Vision ECCV (2016) by Farhadi, Ali, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hocken-maier, and David Forsyth
3. [3] Automatic Caption Generation for News Images by Yansong Feng, and Mirella Lapata, IEEE (2013).
4. [4] Image Caption Generator Based on Deep Neural Networks by Jianhui Chen, Wenqiang Dong and Minchen Li, ACM (2014)
5. [5] Show and Tell: A Neural Image Caption Generator by Oriol Vinyl, Alexander Toshev, Samy Bengio, Dumitru Erhan, IEEE (2015).