Enabling Real-Time On-Chip Audio Super Resolution for Bone-Conduction Microphones

Author:

Li YuangORCID,Wang YuntaoORCID,Liu Xin,Shi Yuanchun,Patel Shwetak,Shih Shao-Fu

Abstract

Voice communication using an air-conduction microphone in noisy environments suffers from the degradation of speech audibility. Bone-conduction microphones (BCM) are robust against ambient noises but suffer from limited effective bandwidth due to their sensing mechanism. Although existing audio super-resolution algorithms can recover the high-frequency loss to achieve high-fidelity audio, they require considerably more computational resources than is available in low-power hearable devices. This paper proposes the first-ever real-time on-chip speech audio super-resolution system for BCM. To accomplish this, we built and compared a series of lightweight audio super-resolution deep-learning models. Among all these models, ATS-UNet was the most cost-efficient because the proposed novel Audio Temporal Shift Module (ATSM) reduces the network’s dimensionality while maintaining sufficient temporal features from speech audio. Then, we quantized and deployed the ATS-UNet to low-end ARM micro-controller units for a real-time embedded prototype. The evaluation results show that our system achieved real-time inference speed on Cortex-M7 and higher quality compared with the baseline audio super-resolution method. Finally, we conducted a user study with ten experts and ten amateur listeners to evaluate our method’s effectiveness to human ears. Both groups perceived a significantly higher speech quality with our method when compared to the solutions with the original BCM or air-conduction microphone with cutting-edge noise-reduction algorithms.

Funder

the Natural Science Foundation of China

Tsinghua University Initiative Scientific Research Program

Beijing Key Lab of Networked Multimedia

the Institute for Guo Qiang, Tsinghua University

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Reference53 articles.

Cited by 6 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Regional Language Speech Recognition from Bone Conducted Speech Signals Through CCWT Algorithm;Circuits, Systems, and Signal Processing;2024-07-04

2. Multi-Microphone Noise Data Augmentation for DNN-Based Own Voice Reconstruction for Hearables in Noisy Environments;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

3. Restoration of Bone-Conducted Speech With U-Net-Like Model and Energy Distance Loss;IEEE Signal Processing Letters;2024

4. Edge Storage Management Recipe with Zero-Shot Data Compression for Road Anomaly Detection;2023 14th International Conference on Information and Communication Technology Convergence (ICTC);2023-10-11

5. Building energy consumption optimization method based on convolutional neural network and BIM;Alexandria Engineering Journal;2023-08

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3