Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function-Reference-Cited by-同舟云学术

Speech Enhancement Using Joint DNN-NMF Model Learned with Multi-Objective Frequency Differential Spectrum Loss Function

Published:2024-01-24 Issue: Volume:2024 Page:1-10
ISSN:1751-9683
Container-title:IET Signal Processing
language:en
Short-container-title:IET Signal Processing

Author:

Pashaian Matin¹^ORCID,Seyedin Sanaz¹^ORCID

Affiliation:

1. Speech Processing Research Lab, Department of Electrical Engineering, Amirkabir University of Technology (Tehran Polytechnique), Tehran, Iran

Abstract

We propose a multi-objective joint model of non-negative matrix factorization (NMF) and deep neural network (DNN) with a new loss function for speech enhancement. The proposed loss function (

L_{MOFD}

) is a weighted combination of a frequency differential spectrum mean squared error (MSE)-based loss function (

L_{FD}

) and a multi-objective MSE loss function

(L_{MO})

. The conventional MSE loss function computes the discrepancy between the estimated speech and clean speech across all frequencies, disregarding the process of changing amplitude in the frequency domain which contains valuable information. The differential spectrum representation retains spectral peaks that carry important information. Using this representation helps to ensure that this information in the speech signal is reserved. Also, on the other hand, noise spectra typically have a flat shape and as the differential operation makes the flat spectral partly close to zero, the differential spectrum is resistant to noises with smooth structures. Thus, we propose using a frequency-differentiated loss function that considers the magnitude spectrum differentiations between the neighboring frequency bins in each time frame. This approach maintains the spectrum variations of the objective signal in the frequency domain, which can effectively reduce the noise deterioration effects. The multi-objective MSE term

(L_{MO})

is a combined two-loss function related to the NMF coefficients which are the intermediate output targets, and the original spectral signals as the actual output targets. The use of encoded NMF coefficients as low-dimensional structural features for DNN serves as prior knowledge and helps the learning process.

L_{MO}

is used beside

L_{FD}

to take advantage of both the properties of the original and the differential spectrum in the training loss function. Moreover, a DNN-based noise classification and fusion strategy (NCF) is proposed to exploit a discriminative model for noise reduction. The experiments reveal the improvements of the proposed approach compared to the previous methods.

Funder

Iran National Science Foundation

Publisher

Institution of Engineering and Technology (IET)

Link

http://downloads.hindawi.com/journals/ietsp/2024/8881007.pdf

Reference40 articles.

1. Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator

2. Speech enhancement with an adaptive Wiener filter

3. Speech Enhancement Under Low SNR Conditions Via Noise Estimation Using Sparse and Low-Rank NMF with Kullback–Leibler Divergence

4. Feature extraction based on DCT and MVDR spectral estimation for robust speech recognition

5. A Novel Jointly Optimized Cooperative DAE-DNN Approach Based on a New Multi-Target Step-Wise Learning for Speech Enhancement

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Using deep learning method to predict dimensionless values of stress intensity factors and T‐stress of edge notch disk bend (ENDB) specimen;Fatigue & Fracture of Engineering Materials & Structures;2024-05-13

2. Speech Enhancement Based on a Joint Two-Stage CRN+DNN-DEC Model and a New Constrained Phase-Sensitive Magnitude Ratio Mask;IEEE Access;2024