Supervised Single Channel Speech Enhancement Based on Dual-Tree Complex Wavelet Transforms and Nonnegative Matrix Factorization Using the Joint Learning Process and Subband Smooth Ratio Mask-Reference-Cited by-同舟云学术

Supervised Single Channel Speech Enhancement Based on Dual-Tree Complex Wavelet Transforms and Nonnegative Matrix Factorization Using the Joint Learning Process and Subband Smooth Ratio Mask

Published:2019-03-22 Issue:3 Volume:8 Page:353
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Islam Md Shohidul^ORCID,Al Mahmud Tarek Hasan,Khan Wasim Ullah,Ye Zhongfu

Abstract

In this paper, we propose a novel speech enhancement method based on dual-tree complex wavelet transforms (DTCWT) and nonnegative matrix factorization (NMF) that exploits the subband smooth ratio mask (ssRM) through a joint learning process. The discrete wavelet packet transform (DWPT) suffers the absence of shift invariance, due to downsampling after the filtering process, resulting in a reconstructed signal with significant noise. The redundant stationary wavelet transform (SWT) can solve this shift invariance problem. In this respect, we use efficient DTCWT with a shift invariance property and limited redundancy and calculate the ratio masks (RMs) between the clean training speech and noisy speech (i.e., training noise mixed with clean speech). We also compute RMs between the noise and noisy speech and then learn both RMs with their corresponding clean training clean speech and noise. The auto-regressive moving average (ARMA) filtering process is applied before NMF in previously generated matrices for smooth decomposition. An ssRM is proposed to exploit the advantage of the joint use of the standard ratio mask (sRM) and square root ratio mask (srRM). In short, the DTCWT produces a set of subband signals employing the time-domain signal. Subsequently, the framing scheme is applied to each subband signal to form matrices and calculates the RMs before concatenation with the previously generated matrices. The ARMA filter is implemented in the nonnegative matrix, which is formed by considering the absolute value. Through ssRM, speech components are detected using NMF in each newly formed matrix. Finally, the enhanced speech signal is obtained via the inverse DTCWT (IDTCWT). The performances are evaluated by considering an IEEE corpus, the GRID audio-visual corpus, and different types of noises. The proposed approach significantly improves objective speech quality and intelligibility and outperforms the conventional STFT-NMF, DWPT-NMF, and DNN-IRM methods.

Funder

National Natural Science Foundation of China

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/8/3/353/pdf

Reference43 articles.

1. Suppression of acoustic noise in speech using spectral subtraction

2. Spectral subtraction based on two-stage spectral estimation and modified cepstrum thresholding

3. Speech enhancement using a soft-decision noise suppression filter

4. Speech Enhancement by MAP Spectral Amplitude Estimation Using a Super-Gaussian Speech Model

5. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Single-channel Speech Separation Based on Double-density Dual-tree CWT and SNMF;Annals of Emerging Technologies in Computing;2024-01-01

2. Speech Enhancement Based on Discrete Wavelet Packet Transform and Itakura-Saito Nonnegative Matrix Factorisation;Archives of Acoustics;2023-07-26

3. Dual transform based joint learning single channel speech separation using generative joint dictionary learning;Multimedia Tools and Applications;2022-04-02

4. Robust Dual Domain Twofold Encrypted Image-in-Audio Watermarking Based on SVD;Circuits, Systems, and Signal Processing;2021-03-23

5. Semi-supervised transient noise suppression using OMLSA and SNMF algorithms;Applied Acoustics;2020-12