Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis-Reference-Cited by-同舟云学术

Deep neural network techniques for monaural speech enhancement and separation: state of the art analysis

Published:2023-10-25 Issue:S3 Volume:56 Page:3651-3703
ISSN:0269-2821
Container-title:Artificial Intelligence Review
language:en
Short-container-title:Artif Intell Rev

Author:

Ochieng Peter

Abstract

AbstractDeep neural networks (DNN) techniques have become pervasive in domains such as natural language processing and computer vision. They have achieved great success in tasks such as machine translation and image generation. Due to their success, these data driven techniques have been applied in audio domain. More specifically, DNN models have been applied in speech enhancement and separation to perform speech denoising, dereverberation, speaker extraction and speaker separation. In this paper, we review the current DNN techniques being employed to achieve speech enhancement and separation. The review looks at the whole pipeline of speech enhancement and separation techniques from feature extraction, how DNN-based tools models both global and local features of speech, model training (supervised and unsupervised) to how they address label ambiguity problem. The review also covers the use of domain adaptation techniques and pre-trained models to boost speech enhancement process. By this, we hope to provide an all inclusive reference of all the state of art DNN based techniques being applied in the domain of speech separation and enhancement. We further discuss future research directions. This survey can be used by both academic researchers and industry practitioners working in speech separation and enhancement domain.

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Linguistics and Language,Language and Linguistics

Link

https://link.springer.com/content/pdf/10.1007/s10462-023-10612-2.pdf

Reference268 articles.

1. Adiga N, Pantazis Y, Tsiaras V, Stylianou Y (2019) Speech enhancement for noise-robust speech synthesis using wasserstein gan. In: INTERSPEECH, pp 1821–1825

2. Aihara R, Hanazawa T, Okato Y, Wichern G, Roux JL (2019) Teacher-student deep clustering for low-delay single channel speech separation. In: ICASSP, IEEE international conference on acoustics, speech and signal processing—proceedings, vol 2019-May, pp 690–694

3. Ai Y, Li H, Wang X, Yamagishi J, Ling Z (2021) Denoising-and-dereverberation hierarchical neural vocoder for robust waveform generation. In: 2021 IEEE spoken language technology workshop, SLT 2021—proceedings, pp 477–484

4. Allen JB (1982) Applications of the short time Fourier transform to speech processing and spectral analysis. In: ICASSP, IEEE international conference on acoustics, speech and signal processing—proceedings, vol 1982-May, pp 1012–1015

5. Allen JB, Rabiner LR (1977) A unified approach to short-time fourier analysis and synthesis. Proc IEEE 65(11):1558–1564

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Synthesizing Lithuanian voice replacement for laryngeal cancer patients with Pareto-optimized flow-based generative synthesis network;Applied Acoustics;2024-09

2. Spiking Structured State Space Model for Monaural Speech Enhancement;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

3. Remixed2remixed: Domain Adaptation for Speech Enhancement by Noise2noise Learning with Remixing;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

4. Speech Enhancement and Denoising Audio for Hard-of-Hearing People in Universities;2024 6th International Youth Conference on Radio Electronics, Electrical and Power Engineering (REEPE);2024-02-29

5. Speech Enhancement Based on a Joint Two-Stage CRN+DNN-DEC Model and a New Constrained Phase-Sensitive Magnitude Ratio Mask;IEEE Access;2024