Voice-Based Image Captioning System for Assisting Visually Impaired People Using Neural Networks-Reference-Cited by-同舟云学术

Voice-Based Image Captioning System for Assisting Visually Impaired People Using Neural Networks

Published:2022-09-30 Issue: Volume: Page:177-199
ISSN:2327-0411
Container-title:Principles and Applications of Socio-Cognitive and Affective Computing
language:
Short-container-title:

Author:

M. Nivedita¹,Y. AsnathVictyPhamila¹,Kumaravelan Umashankar²,N. Karthikeyan³^ORCID

Affiliation:

1. Vellore Institute of Technology, Chennai, India

2. Independent Researcher, India

3. Syed Ammal Engineering College, India

Abstract

Many people worldwide have the problem of visual impairment. The authors' idea is to design a novel image captioning model for assisting the blind people by using deep learning-based architecture. Automatic understanding of the image and providing description of that image involves tasks from two complex fields: computer vision and natural language processing. The first task is to correctly identify objects along with their attributes present in the given image, and the next is to connect all the identified objects along with actions and generating the statements, which should be syntactically correct. From the real-time video, the features are extracted using a convolutional neural network (CNN), and the feature vectors are given as input to long short-term memory (LSTM) network to generate the appropriate captions in a natural language (English). The captions can then be converted into audio files, which the visually impaired people can listen. The model is tested on the two standardized image captioning datasets Flickr 8K and MSCOCO and evaluated using BLEU score.

Publisher

IGI Global

Reference31 articles.

1. METEOR, M-BLEU and M-TER

2. A training algorithm for optimal margin classifiers.;Boser,;Proceedings of the fifth Annual Workshop on Computational Learning Theory,1992

3. Mind's eye: A recurrent visual representation for image caption generation

4. Histograms of Oriented Gradients for Human Detection

5. De Marneffe, MacCartney, & Manning. (2006). Generating typed dependency parses from phrase structure parses. Proceedings of LREC, 6, 449–454.