Features to Text: A Comprehensive Survey of Deep Learning on Semantic Segmentation and Image Captioning-Reference-Cited by-同舟云学术

Features to Text: A Comprehensive Survey of Deep Learning on Semantic Segmentation and Image Captioning

Published:2021-03-18 Issue: Volume:2021 Page:1-19
ISSN:1099-0526
Container-title:Complexity
language:en
Short-container-title:Complexity

Author:

Oluwasammi Ariyo¹^ORCID,Aftab Muhammad Umar²^ORCID,Qin Zhiguang¹^ORCID,Ngo Son Tung³^ORCID,Doan Thang Van³^ORCID,Nguyen Son Ba³^ORCID,Nguyen Son Hoang³^ORCID,Nguyen Giang Hoang³^ORCID

Affiliation:

1. School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054, China

2. Department of Computer Science, National University of Computer and Emerging Sciences, Islamabad, Chiniot-Faisalabad Campus, Chiniot 35400, Pakistan

3. ICT Department, FPT University, Hanoi 10000, Vietnam

Abstract

With the emergence of deep learning, computer vision has witnessed extensive advancement and has seen immense applications in multiple domains. Specifically, image captioning has become an attractive focal direction for most machine learning experts, which includes the prerequisite of object identification, location, and semantic understanding. In this paper, semantic segmentation and image captioning are comprehensively investigated based on traditional and state-of-the-art methodologies. In this survey, we deliberate on the use of deep learning techniques on the segmentation analysis of both 2D and 3D images using a fully convolutional network and other high-level hierarchical feature extraction methods. First, each domain’s preliminaries and concept are described, and then semantic segmentation is discussed alongside its relevant features, available datasets, and evaluation criteria. Also, the semantic information capturing of objects and their attributes is presented in relation to their annotation generation. Finally, analysis of the existing methods, their contributions, and relevance are highlighted, informing the importance of these methods and illuminating a possible research continuation for the application of semantic image segmentation and image captioning approaches.

Funder

NSFC-Guangdong Joint Fund

Publisher

Hindawi Limited

Subject

Multidisciplinary,General Computer Science

Link

http://downloads.hindawi.com/journals/complexity/2021/5538927.pdf

Reference199 articles.

1. Computer vision for assistive technologies

2. Crowdsourcing in Computer Vision

3. Hyperspectral Imaging for Minced Meat Classification Using Nonlinear Deep Features