Affiliation:
1. Dadi Institute of Engineering & Technology (Autonomous), Anakapalle, India
Abstract
Image captioning aims to detect this information by describing the image content through image and text processing techniques. Now-a-days image captioning is one of the recent and growing research problem. Day by day various solutions are being introduced for solving the problem. Even though, many solutions are already available, a lot of attention is still required for getting better and precise results. So, we came up with the idea of developing a image captioning model using different combinations of Convolutional Neural Network architecture along with Long Short Term Memory to get better results. We have used three combination of CNN and LSTM for developing the model. The proposed model is trained with three Convolutional Neural Network architecture such as Inception-v3,Xception, ResNet50 for feature extraction and Long ShortTerm Memory for generating the relevant captions. Among these models, the best combination based on the accuracy of the model. It is trained using the Flicker8k dataset
Reference5 articles.
1. Abhaya Agarwal and Alon Lavie. 2008. Meteor, m-bleu and m-ter: Evaluation metrics for high-correlation with Human rankings of machine translate on output. In Proceedings of the Third Work shop on Statistical Machine Translation.AssociationforComputationalLinguistics,115–118.
2. Ahmet Aker and Robert Gaizauskas. 2010. Generating image descriptions using dependency relational patterns. In Proceedings of the 48th annual meeting of the association for computational linguistics Association for ComputationalLinguistics,1250–1258.
3. Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2016. Spice : Semantic propositional image Caption evaluation. In European Conference on Computer Vision. Springer,382–398.
4. Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. 2017.Bottom-up and top-down attention for image captioning and vqa. ar Xiv preprint arXiv:1707.07998 (2017).
5. Lisa Anne Hendricks, Subhashini Venugopalan, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Trevor Darrell, Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, et al.2016.Deep compositional captioning: Describing novel object categories without paired training data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.