Enabling Real-Time On-Chip Audio Super Resolution for Bone-Conduction Microphones-Reference-Cited by-同舟云学术

Enabling Real-Time On-Chip Audio Super Resolution for Bone-Conduction Microphones

Published:2022-12-20 Issue:1 Volume:23 Page:35
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Li Yuang^ORCID,Wang Yuntao^ORCID,Liu Xin,Shi Yuanchun,Patel Shwetak,Shih Shao-Fu

Abstract

Voice communication using an air-conduction microphone in noisy environments suffers from the degradation of speech audibility. Bone-conduction microphones (BCM) are robust against ambient noises but suffer from limited effective bandwidth due to their sensing mechanism. Although existing audio super-resolution algorithms can recover the high-frequency loss to achieve high-fidelity audio, they require considerably more computational resources than is available in low-power hearable devices. This paper proposes the first-ever real-time on-chip speech audio super-resolution system for BCM. To accomplish this, we built and compared a series of lightweight audio super-resolution deep-learning models. Among all these models, ATS-UNet was the most cost-efficient because the proposed novel Audio Temporal Shift Module (ATSM) reduces the network’s dimensionality while maintaining sufficient temporal features from speech audio. Then, we quantized and deployed the ATS-UNet to low-end ARM micro-controller units for a real-time embedded prototype. The evaluation results show that our system achieved real-time inference speed on Cortex-M7 and higher quality compared with the baseline audio super-resolution method. Finally, we conducted a user study with ten experts and ten amateur listeners to evaluate our method’s effectiveness to human ears. Both groups perceived a significantly higher speech quality with our method when compared to the solutions with the original BCM or air-conduction microphone with cutting-edge noise-reduction algorithms.

Funder

the Natural Science Foundation of China

Tsinghua University Initiative Scientific Research Program

Beijing Key Lab of Networked Multimedia

the Institute for Guo Qiang, Tsinghua University

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/23/1/35/pdf

Reference53 articles.

1. High-resolution frequency-wavenumber spectrum analysis;Capon;Proc. IEEE,1969

2. Generalized Wiener Filtering Computation Techniques;Pratt;IEEE Trans. Comput.,1972

3. Suppression of acoustic noise in speech using spectral subtraction;Boll;IEEE Trans. Acoust. Speech Signal Process.,1979

4. Park, S.R., and Lee, J.W. (2017, January 20–24). A Fully Convolutional Neural Network for Speech Enhancement. Proceedings of the Interspeech 2017, Stockholm, Sweden.

5. Macartney, C., and Weyde, T. (2018). Improved speech enhancement with the wave-u-net. arXiv.

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Regional Language Speech Recognition from Bone Conducted Speech Signals Through CCWT Algorithm;Circuits, Systems, and Signal Processing;2024-07-04

2. Multi-Microphone Noise Data Augmentation for DNN-Based Own Voice Reconstruction for Hearables in Noisy Environments;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

3. Restoration of Bone-Conducted Speech With U-Net-Like Model and Energy Distance Loss;IEEE Signal Processing Letters;2024

4. Edge Storage Management Recipe with Zero-Shot Data Compression for Road Anomaly Detection;2023 14th International Conference on Information and Communication Technology Convergence (ICTC);2023-10-11

5. Building energy consumption optimization method based on convolutional neural network and BIM;Alexandria Engineering Journal;2023-08