Speech Enhancement Using U-Net with Compressed Sensing-Reference-Cited by-同舟云学术

Speech Enhancement Using U-Net with Compressed Sensing

Published:2022-04-20 Issue:9 Volume:12 Page:4161
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Kang Zheng,Huang Zhihua,Lu Chenhua

Abstract

With the development of deep learning, speech enhancement based on deep neural networks had made a great breakthrough. The methods based on U-Net structure achieved good denoising performance. However, part of them rely on ordinary convolution operation, which may ignore the contextual information and detailed features of input speech. To solve this issue, many studies have improved model performance by adding additional network modules, such as attention mechanism, long and short-term memory (LSTM), etc. In this work, therefore, a time-domain U-Net speech enhancement model which combines lightweight Shuffle Attention mechanism and compressed sensing loss (CS loss) is proposed. The time-domain dilated residual blocks are constructed and used for down-sampling and up-sampling in this model. The Shuffle Attention is added to the final output of the encoder for focusing on features of speech and suppressing irrelevant audio information. A new loss is defined by using the measurements of clean speech and enhanced speech based on compressed sensing, it can further remove noise in noisy speech. In the experimental part, the influence of different loss functions on model performance is proved through ablation experiments, and the effectiveness of CS loss is verified. Compared with the reference models, the proposed model can obtain higher speech quality and intelligibility scores with fewer parameters. When dealing with the noise outside the dataset, the proposed model still achieves good denoising performance, which proves that the proposed model can not only achieve a good enhancement effect, but also has good generalization ability.

Funder

National Key R&D Program of China

Natural Science Foundation of Xinjiang Uygur Autonomous Region of China

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/9/4161/pdf

Reference40 articles.

1. Speech Enhancement: Theory and Practice;Loizou,2013

2. Spectral subtraction-based speech enhancement for cochlear implant patients in background noise

3. SEGAN: Speech Enhancement Generative Adversarial Network;Pascual;arXiv,2017

4. Time-domain speech enhancement using generative adversarial networks

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Efficient Framework for Compressed Sensing of Speech Signals;2024 International Telecommunications Conference (ITC-Egypt);2024-07-22

2. Multichannel high noise level ECG denoising based on adversarial deep learning;Scientific Reports;2024-01-08

3. Speech Enhancement Using U-Net-Based Progressive Learning with Squeeze-TCN;Lecture Notes in Networks and Systems;2024

4. Algorithm for Quality Evaluation of Pregnant Women’s Abdominal Electrical Signals Based on CNN Networks;Advances in Clinical Medicine;2024

5. Speech refinement using Bi-LSTM and improved spectral clustering in speaker diarization;Multimedia Tools and Applications;2023-12-05