Speech enhancement from fused features based on deep neural network and gated recurrent unit network-Reference-Cited by-同舟云学术

Speech enhancement from fused features based on deep neural network and gated recurrent unit network

Published:2021-10-24 Issue:1 Volume:2021 Page:
ISSN:1687-6180
Container-title:EURASIP Journal on Advances in Signal Processing
language:en
Short-container-title:EURASIP J. Adv. Signal Process.

Author:

Wang Youming^ORCID,Han Jiali,Zhang Tianqi,Qing Didi

Abstract

AbstractSpeech is easily interfered by external environment in reality, which results in the loss of important features. Deep learning has become a popular speech enhancement method because of its superior potential in solving nonlinear mapping problems for complex features. However, the deficiency of traditional deep learning methods is the weak learning capability of important information from previous time steps and long-term event dependencies between the time-series data. To overcome this problem, we propose a novel speech enhancement method based on the fused features of deep neural networks (DNNs) and gated recurrent unit (GRU). The proposed method uses GRU to reduce the number of parameters of DNNs and acquire the context information of the speech, which improves the enhanced speech quality and intelligibility. Firstly, DNN with multiple hidden layers is used to learn the mapping relationship between the logarithmic power spectrum (LPS) features of noisy speech and clean speech. Secondly, the LPS feature of the deep neural network is fused with the noisy speech as the input of GRU network to compensate the missing context information. Finally, GRU network is performed to learn the mapping relationship between LPS features and log power spectrum features of clean speech spectrum. The proposed model is experimentally compared with traditional speech enhancement models, including DNN, CNN, LSTM and GRU. Experimental results demonstrate that the PESQ, SSNR and STOI of the proposed algorithm are improved by 30.72%, 39.84% and 5.53%, respectively, compared with the noise signal under the condition of matched noise. Under the condition of unmatched noise, the PESQ and STOI of the algorithm are improved by 23.8% and 37.36%, respectively. The advantage of the proposed method is that it uses the key information of features to suppress noise in both matched and unmatched noise cases and the proposed method outperforms other common methods in speech enhancement.

Funder

The Graduate Student Innovation Fund of Xi'an University of Post and Telecommunications

The Key Research and Development Program of Shaanxi Province of China

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s13634-021-00813-8.pdf

Reference27 articles.

1. P.C. Loizou, Speech Enhancement: Theory and Practice, 2nd edn. (CRC Press, Cambridge, 2013)

2. C. Valentinibotinhao, J. Yamagishi, S. King, Evaluating speech intelligibility enhancement for HMM-based synthetic speech in noise (2012)

3. H.N. Moritz, T. Roux, Triggered attention for end-to-end speech recognition. In: Icassp IEEE International Conference on Acoustics (IEEE, 2019).

4. T.V. Sreenivas, P. Rao, Pitch extraction from corrupted harmonics of the power spectrum. J Acoust Soc Am 65(1), 223–228 (1979)

5. C. Fdlwa, Vanessa Aparecida de Moraes Weber b e, C. Gvm, et al. Recognition of Pantaneira cattle breed using computer vision and convolutional neural networks-ScienceDirect. Comput. Electron. Agric. 175.

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. DPHT-ANet: Dual-path high-order transformer-style fully attentional network for monaural speech enhancement;Applied Acoustics;2024-09

2. A DNN Based Adaptive Filter for Speech Enhancement;2024 Second International Conference on Data Science and Information System (ICDSIS);2024-05-17

3. Neferine Pretreatment Attenuates Isoproterenol-Induced Cardiac Injury Through Modulation of Oxidative Stress, Inflammation, and Apoptosis in Rats;Applied Biochemistry and Biotechnology;2024-03-25

4. A Subconvolutional U-net with Gated Recurrent Unit and Efficient Channel Attention Mechanism for Real-Time Speech Enhancement;Wireless Personal Communications;2024-03-04

5. Stacked Multiscale Densely Connected Temporal Convolutional Attention Network for Multi-Objective Speech Enhancement in an Airborne Environment;Aerospace;2024-02-15