Object Recognition System for the Visually Impaired: A Deep Learning Approach using Arabic Annotation-Reference-Cited by-同舟云学术

Object Recognition System for the Visually Impaired: A Deep Learning Approach using Arabic Annotation

Published:2023-01-20 Issue:3 Volume:12 Page:541
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Alzahrani Nada,Al-Baity Heyam H.^ORCID

Abstract

Object detection is an important computer vision technique that has increasingly attracted the attention of researchers in recent years. The literature to date in the field has introduced a range of object detection models. However, these models have largely been English-language-based, and there is only a limited number of published studies that have addressed how object detection can be implemented for the Arabic language. As far as we are aware, the generation of an Arabic text-to-speech engine to utter objects’ names and their positions in images to help Arabic-speaking visually impaired people has not been investigated previously. Therefore, in this study, we propose an object detection and segmentation model based on the Mask R-CNN algorithm that is capable of identifying and locating different objects in images, then uttering their names and positions in Arabic. The proposed model was trained on the Pascal VOC 2007 and 2012 datasets and evaluated on the Pascal VOC 2007 testing set. We believe that this is one of a few studies that uses these datasets to train and test the Mask R-CNN model. The performance of the proposed object detection model was evaluated and compared with previous object detection models in the literature, and the results demonstrated its superiority and ability to achieve an accuracy of 83.9%. Moreover, experiments were conducted to evaluate the performance of the incorporated translator and TTS engines, and the results showed that the proposed model could be effective in helping Arabic-speaking visually impaired people understand the content of digital images.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/3/541/pdf

Reference36 articles.

1. Object Detection With Deep Learning: A Review;Zhao;IEEE Trans. Neural Netw. Learn. Syst.,2019

2. Application of Deep Learning for Object Detection;Pathak;Procedia Comput. Sci.,2018

3. Girshick, R., Donahue, J., Darrell, T., and Malik, J. (2014, January 23–28). Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA.

4. Girshick, R. (2015, January 7–13). Fast R-CNN. Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile.

5. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks;Ren;IEEE Trans. Pattern Anal. Mach. Intell.,2017

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A systematic literature review of visual feature learning: deep learning techniques, applications, challenges and future directions;Multimedia Tools and Applications;2024-07-20

2. Estimation of Muscle Forces of Lower Limbs Based on CNN–LSTM Neural Network and Wearable Sensor System;Sensors;2024-02-05

3. Digital Muhadathah: Framework Model Development for Digital Arabic Language Learning;Lecture Notes in Networks and Systems;2024

4. Real-time obstacle detection for visually impaired people using deep learning;2023 6th International Conference on Signal Processing and Information Security (ICSPIS);2023-11-08

5. Enhancing Object Detection for VIPs Using YOLOv4_Resnet101 and Text-to-Speech Conversion Model;Multimodal Technologies and Interaction;2023-08-02