EarSpeech: Exploring In-Ear Occlusion Effect on Earphones for Data-efficient Airborne Speech Enhancement

Author:

Han Feiyu1ORCID,Yang Panlong2ORCID,Zuo You3ORCID,Shang Fei3ORCID,Xu Fenglei4ORCID,Li Xiang-Yang3ORCID

Affiliation:

1. Nanjing University of Information Science and Technology, Nanjing, China and University of Science and Technology of China, Hefei, China

2. Nanjing University of Information Science and Technology, Nanjing, China

3. University of Science and Technology of China, Hefei, China

4. Suzhou University of Science and Technology, Jiangsu Industrial Intelligent and Low-carbon Technology Engineering Center, Suzhou, China

Abstract

Earphones have become a popular voice input and interaction device. However, airborne speech is susceptible to ambient noise, making it necessary to improve the quality and intelligibility of speech on earphones in noisy conditions. As the dual-microphone structure (i.e., outer and in-ear microphones) has been widely adopted in earphones (especially ANC earphones), we design EarSpeech which exploits in-ear acoustic sensory as the complementary modality to enable airborne speech enhancement. The key idea of EarSpeech is that in-ear speech is less sensitive to ambient noise and exhibits a correlation with airborne speech. However, due to the occlusion effect, in-ear speech has limited bandwidth, making it challenging to directly correlate with full-band airborne speech. Therefore, we exploit the occlusion effect to carry out theoretical modeling and quantitative analysis of this cross-channel correlation and study how to leverage such cross-channel correlation for speech enhancement. Specifically, we design a series of methodologies including data augmentation, deep learning-based fusion, and noise mixture scheme, to improve the generalization, effectiveness, and robustness of EarSpeech, respectively. Lastly, we conduct real-world experiments to evaluate the performance of our system. Specifically, EarSpeech achieves an average improvement ratio of 27.23% and 13.92% in terms of PESQ and STOI, respectively, and significantly improves SI-SDR by 8.91 dB. Benefiting from data augmentation, EarSpeech can achieve comparable performance with a small-scale dataset that is 40 times less than the original dataset. In addition, we validate the generalization of different users, speech content, and language types, respectively, as well as robustness in the real world via comprehensive experiments. The audio demo of EarSpeech is available on https://github.com/EarSpeech/earspeech.github.io/.

Funder

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Reference67 articles.

1. Speech enhancement with an adaptive Wiener filter;Abd El-Fattah Marwa A;International Journal of Speech Technology,2014

2. Rosana Ardila, Megan Branson, Kelly Davis, Michael Henretty, Michael Kohler, Josh Meyer, Reuben Morais, Lindsay Saunders, Francis M. Tyers, and Gregor Weber. 2020. Common Voice: A Massively-Multilingual Speech Corpus. arXiv:1912.06670 [cs.CL]

3. Deepak Baby and Sarah Verhulst. 2019. Sergan: Speech enhancement using relativistic generative adversarial networks with gradient penalty. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 106--110.

4. HeartPrint;Cao Yetong;Passive Heart Sounds Authentication Exploiting In-Ear Microphones. Heart,2023

5. EarAce: Empowering Versatile Acoustic Sensing via Earable Active Noise Cancellation Platform;Cao Yetong;Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies,2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3