The Effect of Different Occupational Background Noises on Voice Recognition Accuracy

Author:

Li Song12,Yerebakan Mustafa Ozkan3,Luo Yue3,Amaba Ben4,Swope William5,Hu Boyi3

Affiliation:

1. University of Florida Department of Computer Information, Science and Engineering, , 303 Weil Hall, Gainesville, FL 32603 ;

2. Johns Hopkins University Department of Computer Science, , 3400 N Charles Street, Baltimore, MD 21218

3. University of Florida Department of Industrial and, Systems Engineering, , 303 Weil Hall, Gainesville, FL 32603

4. IBM , Blue Lagoon Drive, Miami, FL 33126

5. IBM , 410 Robin Hood Cir Unit 102, Naple, FL 34104

Abstract

Abstract Voice recognition has become an integral part of our lives, commonly used in call centers and as part of virtual assistants. However, voice recognition is increasingly applied to more industrial uses. Each of these use cases has unique characteristics that may impact the effectiveness of voice recognition, which could impact industrial productivity, performance, or even safety. One of the most prominent among them is the unique background noises that are dominant in each industry. The existence of different machinery and different work layouts are primary contributors to this. Another important characteristic is the type of communication that is present in these settings. Daily communication often involves longer sentences uttered under relatively silent conditions, whereas communication in industrial settings is often short and conducted in loud conditions. In this study, we demonstrated the importance of taking these two elements into account by comparing the performances of two voice recognition algorithms under several background noise conditions: a regular Convolutional Neural Network (CNN)-based voice recognition algorithm to an Auto Speech Recognition (ASR)-based model with a denoising module. Our results indicate that there is a significant performance drop between the typical background noise use (white noise) and the rest of the background noises. Also, our custom ASR model with the denoising module outperformed the CNN-based model with an overall performance increase between 14–35% across all background noises. Both results give proof that specialized voice recognition algorithms need to be developed for these environments to reliably deploy them as control mechanisms.

Funder

National Science Foundation

Publisher

ASME International

Subject

Industrial and Manufacturing Engineering,Computer Graphics and Computer-Aided Design,Computer Science Applications,Software

Reference52 articles.

1. Evaluation of Google’s Voice Recognition and Sentence Classification for Health Care Applications;Uddin;Eng. Manage. J.,2015

2. Towards Multimodal Emotion Recognition in German Speech Events in Cars Using Transfer Learning;Cevher,2019

3. A Voice-Controlled Multi-Functional Smart Home Automation System;Mittal,2015

4. Speech and Voice Recognition Market by Type (SPEECH and Voice Recognition), End User (Automotive, Healthcare, BFSI, EDUCATION, Legal), Technology (Artificial Intelligence and NON-ARTIFICIAL Intelligence), and Geography—Global Forecast to 2025;Meticulous Market Research,2019

5. Industrially Oriented Voice Control System;Rogowski;Robot. Comput.-Integr. Manuf.,2012

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3