Deep Learning Reader for Visually Impaired-Reference-Cited by-同舟云学术

Deep Learning Reader for Visually Impaired

Published:2022-10-16 Issue:20 Volume:11 Page:3335
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Ganesan Jothi^ORCID,Azar Ahmad Taher^ORCID,Alsenan Shrooq,Kamal Nashwa Ahmad,Qureshi Basit^ORCID,Hassanien Aboul Ella

Abstract

Recent advances in machine and deep learning algorithms and enhanced computational capabilities have revolutionized healthcare and medicine. Nowadays, research on assistive technology has benefited from such advances in creating visual substitution for visual impairment. Several obstacles exist for people with visual impairment in reading printed text which is normally substituted with a pattern-based display known as Braille. Over the past decade, more wearable and embedded assistive devices and solutions were created for people with visual impairment to facilitate the reading of texts. However, assistive tools for comprehending the embedded meaning in images or objects are still limited. In this paper, we present a Deep Learning approach for people with visual impairment that addresses the aforementioned issue with a voice-based form to represent and illustrate images embedded in printed texts. The proposed system is divided into three phases: collecting input images, extracting features for training the deep learning model, and evaluating performance. The proposed approach leverages deep learning algorithms; namely, Convolutional Neural Network (CNN), Long Short Term Memory (LSTM), for extracting salient features, captioning images, and converting written text to speech. The Convolution Neural Network (CNN) is implemented for detecting features from the printed image and its associated caption. The Long Short-Term Memory (LSTM) network is used as a captioning tool to describe the detected text from images. The identified captions and detected text is converted into voice message to the user via Text-To-Speech API. The proposed CNN-LSTM model is investigated using various network architectures, namely, GoogleNet, AlexNet, ResNet, SqueezeNet, and VGG16. The empirical results conclude that the CNN-LSTM based training model with ResNet architecture achieved the highest prediction accuracy of an image caption of 83%.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/11/20/3335/pdf

Reference62 articles.

1. Applications of Machine Learning in Real-Life Digital Health Interventions: Review of the Literature

2. A survey on Assistive Technology for visually impaired

3. Machine learning in biomedical engineering

4. Machine learning of neuroimaging for assisted diagnosis of cognitive impairment and dementia: A systematic review

5. The prevalence of concurrent hearing and vision impairment in the United States;Swenor;JAMA Intern. Med.,2013

Cited by 15 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SignSense: AI Framework for Sign Language Recognition;International Journal of Advanced Research in Science, Communication and Technology;2024-04-14

2. An Improved Robust Fuzzy Local Information K-Means Clustering Algorithm for Diabetic Retinopathy Detection;IEEE Access;2024

3. Dynamic video summarisation using stacked encoder-decoder architecture with residual learning network;International Journal of Intelligent Engineering Informatics;2024

4. A real-time image captioning framework using computer vision to help the visually impaired;Multimedia Tools and Applications;2023-12-22

5. Framework for Face recognition and Scene Description using Deep Learning for Visually Challenged people;2023 International Conference on Emerging Research in Computational Science (ICERCS);2023-12-07