The Effect of Different Occupational Background Noises on Voice Recognition Accuracy-Reference-Cited by-同舟云学术

The Effect of Different Occupational Background Noises on Voice Recognition Accuracy

Published:2022-03-31 Issue:5 Volume:22 Page:
ISSN:1530-9827
Container-title:Journal of Computing and Information Science in Engineering
language:en
Short-container-title:

Author:

Li Song¹²,Yerebakan Mustafa Ozkan³,Luo Yue³,Amaba Ben⁴,Swope William⁵,Hu Boyi³

Affiliation:

1. University of Florida Department of Computer Information, Science and Engineering, , 303 Weil Hall, Gainesville, FL 32603 ;

2. Johns Hopkins University Department of Computer Science, , 3400 N Charles Street, Baltimore, MD 21218

3. University of Florida Department of Industrial and, Systems Engineering, , 303 Weil Hall, Gainesville, FL 32603

4. IBM , Blue Lagoon Drive, Miami, FL 33126

5. IBM , 410 Robin Hood Cir Unit 102, Naple, FL 34104

Abstract

Abstract Voice recognition has become an integral part of our lives, commonly used in call centers and as part of virtual assistants. However, voice recognition is increasingly applied to more industrial uses. Each of these use cases has unique characteristics that may impact the effectiveness of voice recognition, which could impact industrial productivity, performance, or even safety. One of the most prominent among them is the unique background noises that are dominant in each industry. The existence of different machinery and different work layouts are primary contributors to this. Another important characteristic is the type of communication that is present in these settings. Daily communication often involves longer sentences uttered under relatively silent conditions, whereas communication in industrial settings is often short and conducted in loud conditions. In this study, we demonstrated the importance of taking these two elements into account by comparing the performances of two voice recognition algorithms under several background noise conditions: a regular Convolutional Neural Network (CNN)-based voice recognition algorithm to an Auto Speech Recognition (ASR)-based model with a denoising module. Our results indicate that there is a significant performance drop between the typical background noise use (white noise) and the rest of the background noises. Also, our custom ASR model with the denoising module outperformed the CNN-based model with an overall performance increase between 14–35% across all background noises. Both results give proof that specialized voice recognition algorithms need to be developed for these environments to reliably deploy them as control mechanisms.

Funder

National Science Foundation

Publisher

ASME International

Subject

Industrial and Manufacturing Engineering,Computer Graphics and Computer-Aided Design,Computer Science Applications,Software

Link

https://asmedigitalcollection.asme.org/computingengineering/article-pdf/22/5/050905/6869354/jcise_22_5_050905.pdf

Reference52 articles.

1. Evaluation of Google’s Voice Recognition and Sentence Classification for Health Care Applications;Uddin;Eng. Manage. J.,2015

2. Towards Multimodal Emotion Recognition in German Speech Events in Cars Using Transfer Learning;Cevher,2019

3. A Voice-Controlled Multi-Functional Smart Home Automation System;Mittal,2015

4. Speech and Voice Recognition Market by Type (SPEECH and Voice Recognition), End User (Automotive, Healthcare, BFSI, EDUCATION, Legal), Technology (Artificial Intelligence and NON-ARTIFICIAL Intelligence), and Geography—Global Forecast to 2025;Meticulous Market Research,2019

5. Industrially Oriented Voice Control System;Rogowski;Robot. Comput.-Integr. Manuf.,2012

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Comprehensive Review of Auditory and Non-Auditory Effects of Noise on Human Health;Noise and Health;2024-04

2. An embedded TensorFlow lite model for classification of chip images with respect to chip morphology depending on varying feed;Journal of Intelligent Manufacturing;2024-02-23

3. Selection in Stride: Comparing Button- and Head-Based Augmented Reality Interaction During Locomotion;Communications in Computer and Information Science;2024

4. Environment-Aware Knowledge Distillation for Improved Resource-Constrained Edge Speech Recognition;Applied Sciences;2023-11-22

5. Characterizing information access needs in gaze-adaptive augmented reality interfaces: implications for fast-paced and dynamic usage contexts;Human–Computer Interaction;2023-10-13