Ideal ratio mask estimation using supervised DNN approach for target speech signal enhancement-Reference-Cited by-同舟云学术

Ideal ratio mask estimation using supervised DNN approach for target speech signal enhancement

Published:2022-02-02 Issue:3 Volume:42 Page:1869-1883
ISSN:1064-1246
Container-title:Journal of Intelligent & Fuzzy Systems
language:
Short-container-title:IFS

Author:

Selvaraj Poovarasan¹,Chandra E.¹

Affiliation:

1. Department of Computer Science, Bharathiar University, Coimbatore

Abstract

The most challenging process in recent Speech Enhancement (SE) systems is to exclude the non-stationary noises and additive white Gaussian noise in real-time applications. Several SE techniques suggested were not successful in real-time scenarios to eliminate noises in the speech signals due to the high utilization of resources. So, a Sliding Window Empirical Mode Decomposition including a Variant of Variational Model Decomposition and Hurst (SWEMD-VVMDH) technique was developed for minimizing the difficulty in real-time applications. But this is the statistical framework that takes a long time for computations. Hence in this article, this SWEMD-VVMDH technique is extended using Deep Neural Network (DNN) that learns the decomposed speech signals via SWEMD-VVMDH efficiently to achieve SE. At first, the noisy speech signals are decomposed into Intrinsic Mode Functions (IMFs) by the SWEMD Hurst (SWEMDH) technique. Then, the Time-Delay Estimation (TDE)-based VVMD was performed on the IMFs to elect the most relevant IMFs according to the Hurst exponent and lessen the low- as well as high-frequency noise elements in the speech signal. For each signal frame, the target features are chosen and fed to the DNN that learns these features to estimate the Ideal Ratio Mask (IRM) in a supervised manner. The abilities of DNN are enhanced for the categories of background noise, and the Signal-to-Noise Ratio (SNR) of the speech signals. Also, the noise category dimension and the SNR dimension are chosen for training and testing manifold DNNs since these are dimensions often taken into account for the SE systems. Further, the IRM in each frequency channel for all noisy signal samples is concatenated to reconstruct the noiseless speech signal. At last, the experimental outcomes exhibit considerable improvement in SE under different categories of noises.

Publisher

IOS Press

Subject

Artificial Intelligence,General Engineering,Statistics and Probability

Reference22 articles.

1. Gulati S. , Comprehensive review of various speech enhancement techniques. In International Conference on Computational Vision and Bio Inspired Computing, Springer, Cham (2020), 536–540.

2. Audio-visual voice activity detection using diffusion maps;Dov;IEEE/ACM Transactions on Audio, Speech, and Language Processing,2015

3. Robust estimation of non-stationary noise power spectrum for speech enhancement;Mai;IEEE/ACM Transactions on Audio, Speech, and Language Processing,2015

4. Robust noise power spectral density estimation for binaural speech enhancement in time-varying diffuse noise field;Ji;EURASIP Journal on Audio, Speech, and Music Processing,2017

5. Decision-directed speech power spectral density matrix estimation for multichannel speech enhancement;Jin;The Journal of the Acoustical Society of America,2017

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. BIUnet: Unet Network for Mask Estimation in Single-Channel Speech Enhancement;2024 5th International Seminar on Artificial Intelligence, Networking and Information Technology (AINIT);2024-03-29

2. A Speech Enhancement Method Combining Two-Branch Communication and Spectral Subtraction;Communications in Computer and Information Science;2023