Multi-Task Learning U-Net for Single-Channel Speech Enhancement and Mask-Based Voice Activity Detection-Reference-Cited by-同舟云学术

Multi-Task Learning U-Net for Single-Channel Speech Enhancement and Mask-Based Voice Activity Detection

Published:2020-05-06 Issue:9 Volume:10 Page:3230
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Lee Geon Woo,Kim Hong Kook^ORCID

Abstract

In this paper, a multi-task learning U-shaped neural network (MTU-Net) is proposed and applied to single-channel speech enhancement (SE). The proposed MTU-based SE method estimates an ideal binary mask (IBM) or an ideal ratio mask (IRM) by extending the decoding network of a conventional U-Net to simultaneously model the speech and noise spectra as the target. The effectiveness of the proposed SE method was evaluated under both matched and mismatched noise conditions between training and testing by measuring the perceptual evaluation of speech quality (PESQ) and short-time objective intelligibility (STOI). Consequently, the proposed SE method with IRM achieved a substantial improvement with higher average PESQ scores by 0.17, 0.52, and 0.40 than other state-of-the-art deep-learning-based methods, such as the deep recurrent neural network (DRNN), SE generative adversarial network (SEGAN), and conventional U-Net, respectively. In addition, the STOI scores of the proposed SE method are 0.07, 0.05, and 0.05 higher than those of the DRNN, SEGAN, and U-Net, respectively. Next, voice activity detection (VAD) is also proposed by using the IRM estimated by the proposed MTU-Net-based SE method, which is fundamentally an unsupervised method without any model training. Then, the performance of the proposed VAD method was compared with the performance of supervised learning-based methods using a deep neural network (DNN), a boosted DNN, and a long short-term memory (LSTM) network. Consequently, the proposed VAD methods show a slightly better performance than the three neural network-based methods under mismatched noise conditions.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/10/9/3230/pdf

Reference40 articles.

1. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator

2. Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria

3. Joint optimization of masks deep recurrent neural networks for monaural source separation;Huang;IEEE/ACM Trans. Audio, Speech Lang. Process.,2015

Cited by 21 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Supervised Single Channel Speech Enhancement Method Using UNET;Electronics;2023-07-12

2. Teacher-Student Training Approach Using an Adaptive Gain Mask for LSTM-Based Speech Enhancement in the Airborne Noise Environment;Chinese Journal of Electronics;2023-07

3. Liver PDFF estimation using a multi-decoder water-fat separation neural network with a reduced number of echoes;European Radiology;2023-04-04

4. Frequency of Interest-based Noise Attenuation Method to Improve Anomaly Detection Performance;2023 IEEE International Conference on Big Data and Smart Computing (BigComp);2023-02

5. Brain Tumor Segmentation through Level Based Learning Model;Computer Systems Science and Engineering;2023