Affiliation:
1. Institute of Control and Industrial Electronics, Warsaw University of Technology, ul. Koszykowa 75, 00-662 Warszawa, Poland
Abstract
Image captioning aims at generating meaningful verbal descriptions of a digital image. This domain is rapidly growing due to the enormous increase in available computational resources. The most advanced methods are, however, resource-demanding. In our paper, we return to the encoder–decoder deep-learning model and investigate how replacing its components with newer equivalents improves overall effectiveness. The primary motivation of our study is to obtain the highest possible level of improvement of classic methods, which are applicable in less computational environments where most advanced models are too heavy to be efficiently applied. We investigate image feature extractors, recurrent neural networks, word embedding models, and word generation layers and discuss how each component influences the captioning model’s overall performance. Our experiments are performed on the MS COCO 2014 dataset. As a result of our research, replacing components improves the quality of generating image captions. The results will help design efficient models with optimal combinations of their components.
Reference93 articles.
1. Deep Multimodal Learning: A Survey on Recent Advances and Trends;Ramachandram;IEEE Signal Process. Mag.,2017
2. Image captioning via semantic element embedding;Zhang;Neurocomputing,2020
3. Learning multimodal entity representations and their ensembles, with applications in a data-driven advisory framework for video game players;Janusz;Inf. Sci.,2022
4. Zhang, W., and Sugeno, M. (April, January 28). A fuzzy approach to scene understanding. Proceedings of the [Proceedings 1993] Second IEEE International Conference on Fuzzy Systems, San Francisco, CA, USA.
5. Iwanowski, M., and Bartosiewicz, M. (2021, January 11–14). Describing images using fuzzy mutual position matrix and saliency-based ordering of predicates. Proceedings of the 2021 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE), Luxembourg.