DCT based densely connected convolutional GRU for real-time speech enhancement-Reference-Cited by-同舟云学术

DCT based densely connected convolutional GRU for real-time speech enhancement

Published:2023-07-02 Issue:1 Volume:45 Page:1195-1208
ISSN:1064-1246
Container-title:Journal of Intelligent & Fuzzy Systems
language:
Short-container-title:IFS

Author:

Jannu Chaitanya¹,Vanambathina Sunny Dayal¹

Affiliation:

1. School of Electronics Engineering, VIT-AP University, Amaravati, India

Abstract

Over the past ten years, deep learning has enabled significant advancements in the improvement of noisy speech. Due to the short time stability of speech signal, previous speech enhancement (SE) methods concentrated only on magnitude estimation, and these methods added a phase of the mixture in reconstructing the speech. The performance is limited in these approaches since the phase will also carry some of the speech information. Some of the speech enhancement approaches were developed later to jointly estimate both magnitudes as well as phases. Recently, complex-valued models, like deep complex convolution recurrent network (DCCRN), are proposed, but the computation of the model is very huge. In this work, we propose a Discrete Cosine Transform-based Densely Connected Convolutional Gated Recurrent Unit (DCTDCCGRU) model using dilated dense block and stacked GRU. The dense connectivity strengthens the gradient propagation by concatenating features from previous layers at the input. The advantage of the dense block is that at various resolutions, the dilated convolutions aid with context aggregation, and the dense connectivity provides a feature map with more precise target information by passing through multiple layers. To represent the correlation between neighboring noisy speech frames, a two Layer GRU is added in the bottleneck of U-Net. The experimental findings demonstrate that the proposed model outperformed the other existing models in terms of STOI (short-time objective intelligibility), PESQ (perceptual evaluation of the speech quality), and output SNR (signal-to-noise ratio).

Publisher

IOS Press

Subject

Artificial Intelligence,General Engineering,Statistics and Probability

Reference15 articles.

1. Discrete cosine transform;Ahmed;IEEE transactions on Computers,1974

2. Image method for efficiently simulating small-room acoustics;Allen;The Journal of the Acoustical Society of America,1979

3. Features for masking-based monaural speech separation in reverberant conditions;Delfarah;IEEE/ACM Transactions on Audio, Speech, and Language Processing,2017

4. Conv-tasnet: Surpassing ideal time–frequency magnitude masking for speech separation;Luo;IEEE/ACM Transactions on Audio, Speech, and Language Processing,2019

5. A deep learning loss function based on the perceptual evaluation of the speech quality;Martin-Donas;IEEE Signal Processing Letters,2018