Enhancing Object Detection for VIPs Using YOLOv4_Resnet101 and Text-to-Speech Conversion Model

Author:

Alahmadi Tahani Jaser1ORCID,Rahman Atta Ur2,Alkahtani Hend Khalid1ORCID,Kholidy Hisham3ORCID

Affiliation:

1. Department of Information Systems, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University (PNU), P.O. Box 84428, Riyadh 11671, Saudi Arabia

2. Faculty of Computer Science and Engineering, GIK Institute of Engineering Sciences and Technology, Swabi 23640, Pakistan

3. Department of Networks and Computer Security, SUNY Polytechnic Institute, College of Engineering, Utica, NY 13502, USA

Abstract

Vision impairment affects an individual’s quality of life, posing challenges for visually impaired people (VIPs) in various aspects such as object recognition and daily tasks. Previous research has focused on developing visual navigation systems to assist VIPs, but there is a need for further improvements in accuracy, speed, and inclusion of a wider range of object categories that may obstruct VIPs’ daily lives. This study presents a modified version of YOLOv4_Resnet101 as backbone networks trained on multiple object classes to assist VIPs in navigating their surroundings. In comparison to the Darknet, with a backbone utilized in YOLOv4, the ResNet-101 backbone in YOLOv4_Resnet101 offers a deeper and more powerful feature extraction network. The ResNet-101’s greater capacity enables better representation of complex visual patterns, which increases the accuracy of object detection. The proposed model is validated using the Microsoft Common Objects in Context (MS COCO) dataset. Image pre-processing techniques are employed to enhance the training process, and manual annotation ensures accurate labeling of all images. The module incorporates text-to-speech conversion, providing VIPs with auditory information to assist in obstacle recognition. The model achieves an accuracy of 96.34% on the test images obtained from the dataset after 4000 iterations of training, with a loss error rate of 0.073%.

Funder

King Salman center For Disability Research

Publisher

MDPI AG

Subject

Computer Networks and Communications,Computer Science Applications,Human-Computer Interaction,Neuroscience (miscellaneous)

Cited by 7 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Empowering Independence through Real Time Object Identification and Navigation for People with Disabilities;International Journal of Advanced Research in Science, Communication and Technology;2024-02-08

2. Dual Kernel Support Vector-based Crossover Red Fox Algorithm: Advancements in Assistive Technology for Hearing-impaired Individuals;Journal of Disability Research;2024

3. Adaptative Access Management in 5G IoE using Device Fingerprinting: Discourse, Mechanisms, Challenges, and Opportunities;2023 20th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA);2023-12-04

4. Secure the 5G and Beyond Networks with Zero Trust and Access Control Systems for Cloud Native Architectures;2023 20th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA);2023-12-04

5. Enhancing Security in 5G Networks: A Hybrid Machine Learning Approach for Attack Classification;2023 20th ACS/IEEE International Conference on Computer Systems and Applications (AICCSA);2023-12-04

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3