Speech enhancement by LSTM-based noise suppression followed by CNN-based speech restoration-Reference-Cited by-同舟云学术

Speech enhancement by LSTM-based noise suppression followed by CNN-based speech restoration

Published:2020-12 Issue:1 Volume:2020 Page:
ISSN:1687-6180
Container-title:EURASIP Journal on Advances in Signal Processing
language:en
Short-container-title:EURASIP J. Adv. Signal Process.

Author:

Strake Maximilian,Defraene Bruno,Fluyt Kristoff,Tirry Wouter,Fingscheidt Tim

Abstract

AbstractSingle-channel speech enhancement in highly non-stationary noise conditions is a very challenging task, especially when interfering speech is included in the noise. Deep learning-based approaches have notably improved the performance of speech enhancement algorithms under such conditions, but still introduce speech distortions if strong noise suppression shall be achieved. We propose to address this problem by using a two-stage approach, first performing noise suppression and subsequently restoring natural sounding speech, using specifically chosen neural network topologies and loss functions for each task. A mask-based long short-term memory (LSTM) network is employed for noise suppression and speech restoration is performed via spectral mapping with a convolutional encoder-decoder network (CED). The proposed method improves speech quality (PESQ) over state-of-the-art single-stage methods by about 0.1 points for unseen highly non-stationary noise types including interfering speech. Furthermore, it is able to increase intelligibility in low-SNR conditions and consistently outperforms all reference methods.

Funder

NXP Semiconductors, Product Line Voice and Audio Solutions, Belgium

Publisher

Springer Science and Business Media LLC

Link

http://link.springer.com/content/pdf/10.1186/s13634-020-00707-1.pdf

Reference64 articles.

1. Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech, Signal Process.32(6), 1109–1121 (1984).

2. Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Sig. Process.33(2), 443–445 (1985).

3. P. Scalart, J. V. Filho, in Proc. of ICASSP. Speech enhancement based on a priori signal to noise estimation (IEEEAtlanta, 1996), pp. 629–632.

4. T. Lotter, P. Vary, Speech enhancement by map spectral amplitude estimation using a super-Gaussian speech model. EURASIP J. Adv. Sig. Process.2005(7), 1110–1126 (2005).

5. C. Breithaupt, T. Gerkmann, R. Martin, in Proc. of ICASSP. A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing (IEEELas Vegas, 2008), pp. 4897–4900.

Cited by 31 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Speech enhancement using deep complex convolutional neural network (DCCNN) model;Signal, Image and Video Processing;2024-08-14

2. Enhancing Cutting Sound Quality in Tool Wear Monitoring via Hybrid Domain Loss UNet Network;2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC);2024-07-02

3. CST-UNet: Cross Swin Transformer Enhanced U-Net with Masked Bottleneck for Single-Channel Speech Enhancement;Circuits, Systems, and Signal Processing;2024-06-16

4. Towards Efficient Recurrent Architectures: A Deep LSTM Neural Network Applied to Speech Enhancement and Recognition;Cognitive Computation;2024-04-30

5. CNN-LSTM architectures for non-stationary time series: decomposition approach;2024 International Conference on Global Aeronautical Engineering and Satellite Technology (GAST);2024-04-24