Affiliation:
1. Wojskowa Akademia Techniczna Wydział Cybernetyki
Abstract
This paper describes an image caption generation system using deep neural networks. The model is trained to maximize the probability of generated sentence, given the image. The model utilizes transfer learning in the form of pretrained convolutional neural networks to preprocess the image data. The datasets are composed of a still photographs and associated with it, five captions in English language. Constructed model is compared to other similarly constructed models using BLEU score system and ways to further improve its performance are proposed.
Reference13 articles.
1. Farhadi A., et al., “Every picture tells a Ssory: Generating sentences from images”, Computer Vision – ECCV 2010, LNCS 6314, pp. 15–29, Springer 2010.
2. Mitchell M., et al., “Midge: Generating image descriptions from computer vision detections”, in: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, pp. 747–756, April 2012.
3. Bai S., An S., “A survey on automatic image caption generation”, Neurocomputing, Vol. 311, 291–304 (2018).
4. Mikolov T., et al., "Efficient estimation of word representations in vector space”, arXiv preprint arXiv : 1301.3781, September 2013.
5. Tanti M., et al., “Where to put the image in an image caption generator”, Natural Language Engineering, Vol. 24(3), 467–489(2018).