Supervised Deep Learning Techniques for Image Description: A Systematic Review-Reference-Cited by-同舟云学术

Supervised Deep Learning Techniques for Image Description: A Systematic Review

Published:2023-03-23 Issue:4 Volume:25 Page:553
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

López-Sánchez Marco¹^ORCID,Hernández-Ocaña Betania¹^ORCID,Chávez-Bosquez Oscar¹^ORCID,Hernández-Torruco José¹^ORCID

Affiliation:

1. División Académica de Ciencias y Tecnologías de la Información, Universidad Juárez Autónoma de Tabasco, Cunduacán 86690, Tabasco, Mexico

Abstract

Automatic image description, also known as image captioning, aims to describe the elements included in an image and their relationships. This task involves two research fields: computer vision and natural language processing; thus, it has received much attention in computer science. In this review paper, we follow the Kitchenham review methodology to present the most relevant approaches to image description methodologies based on deep learning. We focused on works using convolutional neural networks (CNN) to extract the characteristics of images and recurrent neural networks (RNN) for automatic sentence generation. As a result, 53 research articles using the encoder-decoder approach were selected, focusing only on supervised learning. The main contributions of this systematic review are: (i) to describe the most relevant image description papers implementing an encoder-decoder approach from 2014 to 2022 and (ii) to determine the main architectures, datasets, and metrics that have been applied to image description.

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/25/4/553/pdf

Reference83 articles.

1. Phung, D., Tseng, V.S., Webb, G.I., Ho, B., Ganji, M., and Rashidi, L. (2018, January 3–6). Text Generation Based on Generative Adversarial Nets with Latent Variables. Proceedings of the Advances in Knowledge Discovery and Data Mining, Melbourne, VIC, Australia.

2. Dai, B., Fidler, S., Urtasun, R., and Lin, D. (2017, January 22–29). Towards Diverse and Natural Image Descriptions via a Coplease confirm the added informationnditional GAN. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.

3. Shetty, R., Rohrbach, M., Anne Hendricks, L., Fritz, M., and Schiele, B. (2017, January 22–29). Speaking the Same Language: Matching Machine to Human Captions by Adversarial Training. Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy.

4. Nayak, A.C., and Sharma, A. (2019, January 26–30). Towards Generating Stylized Image Captions via Adversarial Training. Proceedings of the PRICAI 2019: Trends in Artificial Intelligence, Cuvu, Yanuca Island, Fiji.

5. Multi-Gate Attention Network for Image Captioning;Jiang;IEEE Access,2021

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine learning approaches to detect hepatocyte chromatin alterations from iron oxide nanoparticle exposure;Scientific Reports;2024-08-23

2. Optimizing image captioning: The effectiveness of vision transformers and VGG networks for remote sensing;Big Data Research;2024-08

3. AutoST-Net: A Spatiotemporal Feature-Driven Approach for Accurate Forest Fire Spread Prediction from Remote Sensing Data;Forests;2024-04-17

4. Artificial intelligence strategies based on run length matrix and wavelet analyses for detection of subtle alterations in hepatocyte chromatin organization following exposure to iron oxide nanoparticles;2024-02-13

5. Transforming Healthcare: Leveraging Vision-Based Neural Networks for Smart Home Patient Monitoring;International Journal of Online and Biomedical Engineering (iJOE);2023-08-01