Supervised Single Channel Speech Enhancement Method Using UNET-Reference-Cited by-同舟云学术

Supervised Single Channel Speech Enhancement Method Using UNET

Published:2023-07-12 Issue:14 Volume:12 Page:3052
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Hossain Md. Nahid¹,Basir Samiul¹^ORCID,Hosen Md. Shakhawat¹,Asaduzzaman A.O.M.¹,Islam Md. Mojahidul¹,Hossain Mohammad Alamgir¹,Islam Md Shohidul¹²^ORCID

Affiliation:

1. Department of Computer Science and Engineering, Islamic University, Kushtia 7003, Bangladesh

2. Hong Kong Centre for Cerebro-Cardiovascular Health Engineering (COCHE), The City University of Hong Kong, Kowloon, Hong Kong

Abstract

This paper proposes an innovative single-channel supervised speech enhancement (SE) method based on UNET, a convolutional neural network (CNN) architecture that expands on a few changes in the basic CNN architecture. In the training phase, short-time Fourier transform (STFT) is exploited on the noisy time domain signal to build a noisy time-frequency domain signal which is called a complex noisy matrix. We take the real and imaginary parts of the complex noisy matrix and concatenate both of them to form the noisy concatenated matrix. We apply UNET to the noisy concatenated matrix for extracting speech components and train the CNN model. In the testing phase, the same procedure is applied to the noisy time-domain signal as in the training phase in order to construct another noisy concatenated matrix that can be tested using a pre-trained or saved model in order to construct an enhanced concatenated matrix. Finally, from the enhanced concatenated matrix, we separate both the imaginary and real parts to form an enhanced complex matrix. Magnitude and phase are then extracted from the newly created enhanced complex matrix. By using that magnitude and phase, the inverse STFT (ISTFT) can generate the enhanced speech signal. Utilizing the IEEE databases and various types of noise, including stationary and non-stationary noise, the proposed method is evaluated. Comparing the exploratory results of the proposed algorithm to the other five methods of STFT, sparse non-negative matrix factorization (SNMF), dual-tree complex wavelet transform (DTCWT)-SNMF, DTCWT-STFT-SNMF, STFT-convolutional denoising auto encoder (CDAE) and casual multi-head attention mechanism (CMAM) for speech enhancement, we determine that the proposed algorithm generally improves speech quality and intelligibility at all considered signal-to-noise ratios (SNRs). The suggested approach performs better than the other five competing algorithms in every evaluation metric.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/12/14/3052/pdf

Reference40 articles.

1. Loizou, P. (2013). Speech Enhancement: Theory and Practice, CRC Press.

2. Speech Enhancement Using a Minimum Mean-Square Error Log-Spectral Amplitude Estimator;Ephraim;IEEE Trans. Acoust. Speech Signal Process.,1985

3. Cohen, I., and Gannot, S. (2008). Springer Handbook of Speech Processing, Springer.

4. Hao, X., and Li, X. (2022). Fast FullSubNet: Accelerate Full-Band and Sub-Band Fusion Model for Single-Channel Speech Enhancement. arXiv.

5. Single-Channel Speech Enhancement Using Improved Progressive Deep Neural Network and Masking-Based Harmonic Regeneration;Ping;Speech Commun.,2022

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. BIUnet: Unet Network for Mask Estimation in Single-Channel Speech Enhancement;2024 5th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT);2024-03-29

2. CNN-based noise reduction for multi-channel speech enhancement system with discrete wavelet transform (DWT) preprocessing;PeerJ Computer Science;2024-02-28

3. Spatio-Temporal Features Representation Using Recurrent Capsules for Monaural Speech Enhancement;IEEE Access;2024