Speech enhancement by LSTM-based noise suppression followed by CNN-based speech restoration

Author:

Strake Maximilian,Defraene Bruno,Fluyt Kristoff,Tirry Wouter,Fingscheidt Tim

Abstract

AbstractSingle-channel speech enhancement in highly non-stationary noise conditions is a very challenging task, especially when interfering speech is included in the noise. Deep learning-based approaches have notably improved the performance of speech enhancement algorithms under such conditions, but still introduce speech distortions if strong noise suppression shall be achieved. We propose to address this problem by using a two-stage approach, first performing noise suppression and subsequently restoring natural sounding speech, using specifically chosen neural network topologies and loss functions for each task. A mask-based long short-term memory (LSTM) network is employed for noise suppression and speech restoration is performed via spectral mapping with a convolutional encoder-decoder network (CED). The proposed method improves speech quality (PESQ) over state-of-the-art single-stage methods by about 0.1 points for unseen highly non-stationary noise types including interfering speech. Furthermore, it is able to increase intelligibility in low-SNR conditions and consistently outperforms all reference methods.

Funder

NXP Semiconductors, Product Line Voice and Audio Solutions, Belgium

Publisher

Springer Science and Business Media LLC

Reference64 articles.

1. Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech, Signal Process.32(6), 1109–1121 (1984).

2. Y. Ephraim, D. Malah, Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans. Acoust. Speech Sig. Process.33(2), 443–445 (1985).

3. P. Scalart, J. V. Filho, in Proc. of ICASSP. Speech enhancement based on a priori signal to noise estimation (IEEEAtlanta, 1996), pp. 629–632.

4. T. Lotter, P. Vary, Speech enhancement by map spectral amplitude estimation using a super-Gaussian speech model. EURASIP J. Adv. Sig. Process.2005(7), 1110–1126 (2005).

5. C. Breithaupt, T. Gerkmann, R. Martin, in Proc. of ICASSP. A novel a priori SNR estimation approach based on selective cepstro-temporal smoothing (IEEELas Vegas, 2008), pp. 4897–4900.

Cited by 31 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Speech enhancement using deep complex convolutional neural network (DCCNN) model;Signal, Image and Video Processing;2024-08-14

2. Enhancing Cutting Sound Quality in Tool Wear Monitoring via Hybrid Domain Loss UNet Network;2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC);2024-07-02

3. CST-UNet: Cross Swin Transformer Enhanced U-Net with Masked Bottleneck for Single-Channel Speech Enhancement;Circuits, Systems, and Signal Processing;2024-06-16

4. Towards Efficient Recurrent Architectures: A Deep LSTM Neural Network Applied to Speech Enhancement and Recognition;Cognitive Computation;2024-04-30

5. CNN-LSTM architectures for non-stationary time series: decomposition approach;2024 International Conference on Global Aeronautical Engineering and Satellite Technology (GAST);2024-04-24

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3