Wavoice: A mmWave-assisted Noise-resistant Speech Recognition System

Author:

Liu Tiantian,Wang Chao,Li Zhengxiong,Huang Ming-Chun,Xu Wenyao,Lin Feng1

Affiliation:

1. T. Liu, C. Wang, and F. Lin are with ZJU-Hangzhou Global Scientific and Technological Innovation Center, the School of Cyber Science and Technology, Zhejiang University, China and Z. Li is with the Department of Computer Science and Engineering University of Colorado Denver, United States. and M. Huang is with the Department of Data and Computational ScienceDuke Kunshan University, Jiangsu, 215316, China. and W. Xu is with the Department of Computer Science and Engineering, University at Buffalo,...

Abstract

As automatic speech recognition evolves, the deployment of voice user interface has boomingly expanded. Especially since the COVID-19 pandemic, VUI has gained more attention in online communication owing to its non-contact property. However, VUI struggles to be applied in public scenes due to the degradation of received audio signals caused by various ambient noises. In this paper, we propose Wavoice , the first noise-resistant multi-modal speech recognition system that fuses two distinct voice sensing modalities, i.e., millimeter-wave (mmWave) signals and audio signals from a microphone, together. One key contribution is to model the inherent correlation between mmWave and audio signals. Based on it, Wavoice facilitates the real-time noise-resistant voice activity detection and user targeting from multiple speakers. Additionally, we elaborate on two novel modules for multi-modal fusion embedded into the neural network, leading to accurate speech recognition. Extensive experiments prove the effectiveness of Wavoice under adverse conditions, that is, the character recognition error rate below 1 \(\% \) in a range of 7 meters. In terms of robustness and accuracy, Wavoice considerably outperforms existing audio-only speech recognition methods with lower character error rate and word error rate.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications

Reference99 articles.

1. Triantafyllos Afouras Joon Son Chung and Andrew Zisserman. 2018. The conversation: Deep audio-visual speech enhancement. arXiv preprint arXiv:1804.04121(2018). Triantafyllos Afouras Joon Son Chung and Andrew Zisserman. 2018. The conversation: Deep audio-visual speech enhancement. arXiv preprint arXiv:1804.04121(2018).

2. Omer Saad Alkhafaf , Mousa  K. Wali , and Ali  H. Al-Timemy . 2020 . Improved Prosthetic Hand Control with Synchronous Use of Voice Recognition and Inertial Measurements. IOP Conference Series: Materials Science and Engineering 745 , 1(2020), 012088. Omer Saad Alkhafaf, Mousa K. Wali, and Ali H. Al-Timemy. 2020. Improved Prosthetic Hand Control with Synchronous Use of Voice Recognition and Inertial Measurements. IOP Conference Series: Materials Science and Engineering 745, 1(2020), 012088.

3. Milli-rio: Ego-motion estimation with low-cost millimetre-wave radar;Almalioglu Yasin;IEEE Sensors Journal,2020

4. Amazon.com. 2022. Amazon echo. https://www.amazon.com/echo/ Amazon.com. 2022. Amazon echo. https://www.amazon.com/echo/

5. Dario Amodei , Sundaram Ananthanarayanan , Rishita Anubhai , Jingliang Bai , Eric Battenberg , Carl Case , Jared Casper , Bryan Catanzaro , Qiang Cheng , Guoliang Chen , et al. 2016 . Deep speech 2: End-to-end speech recognition in english and mandarin . In International conference on machine learning. 173–182 . Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Qiang Cheng, Guoliang Chen, et al. 2016. Deep speech 2: End-to-end speech recognition in english and mandarin. In International conference on machine learning. 173–182.

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3