Auditory filterbank denoising neural network for speech enhancement in wearable auditory device-Reference-Cited by-同舟云学术

Auditory filterbank denoising neural network for speech enhancement in wearable auditory device

Published:2024-05 Issue:10 Volume:60 Page:
ISSN:0013-5194
Container-title:Electronics Letters
language:en
Short-container-title:Electronics Letters

Author:

Kim Seon Man¹^ORCID

Affiliation:

1. Spatial Optical Information Research Center Korea Photonics Technology Institute Gwangju Republic of Korea

Abstract

AbstractIn this study, a speech enhancing neural network (NN) is proposed, which is designed for monaural auditory devices, specifically designed for use in hearing aids. Herein, a 32‐channel auditory filterbank (FB) is first implemented with an algorithm processing delay of 8 ms, which is tailored to meet the requirements of auditory devices. The proposed method primarily aims to integrate a denoising NN within the analysis phase of a uniform polyphase discrete Fourier transform (DFT) FB, aimed at enhancing speech within each band. For the denoising model, complex‐valued convolutional NNs have been applied, specifically targeting the restoration of speech phase information based on the spectral components of the DFT. A multi‐loss method is introduced, which is designed to further account for the loss of analysed speech signals within the split bands during the training process, leveraging the DFT FB strategy. To evaluate the efficacy of the proposed method, objective assessments of speech intelligibility and quality scores are conducted under various noise conditions. The results demonstrate that the proposed method can outperform the existing method across all types of noise.

Publisher

Institution of Engineering and Technology (IET)

Reference10 articles.

1. Stoller D. Ewert S. Dixon S.:Wave‐U‐Net: A multi‐scale neural network for end‐to‐end audio source separation. ArXiv (2018).https://doi.org/10.48550/arXiv.1806.03185

2. Liu Y. Lv S. Xing M. Zhang S. Fu Y. Wu J. Zhang B. Xie L.:Dccrn: Deep complex convolution recurrent network for phase‐aware speech enhancement. ArXiv (2020).https://doi.org/10.48550/arXiv.2008.00264

3. An open development platform for auditory real-time signal processing

4. Wearable Hearing Device Spectral Enhancement Driven by Non-Negative Sparse Coding-Based Residual Noise Reduction

5. Luo Y. Mesgarani N.:Conv‐TasNet: surpassing ideal time‐frequency magnitude masking for speech separation. ArXiv (2019).https://doi.org/10.48550/arXiv.1809.07454