A Survey on Low-Latency DNN-Based Speech Enhancement-Reference-Cited by-同舟云学术

A Survey on Low-Latency DNN-Based Speech Enhancement

Published:2023-01-26 Issue:3 Volume:23 Page:1380
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Drgas Szymon¹^ORCID

Affiliation:

1. Institute of Automatic Control and Robotics, Poznan University of Technology, Piotrowo 3A Street, 60-965 Poznan, Poland

Abstract

This paper presents recent advances in low-latency, single-channel, deep neural network-based speech enhancement systems. The sources of latency and their acceptable values in different applications are described. This is followed by an analysis of the constraints imposed on neural network architectures. Specifically, the causal units used in deep neural networks are presented and discussed in the context of their properties, such as the number of parameters, the receptive field, and computational complexity. This is followed by a discussion of techniques used to reduce the computational complexity and memory requirements of the neural networks used in this task. Finally, the techniques used by the winners of the latest speech enhancement challenges (DNS, Clarity) are shown and compared.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/23/3/1380/pdf

Reference117 articles.

1. Taal, C.H., Hendriks, R.C., Heusdens, R., and Jensen, J. (2010, January 14–19). A short-time objective intelligibility measure for time-frequency weighted noisy speech. Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA.

2. Rix, A.W., Beerends, J.G., Hollier, M.P., and Hekstra, A.P. (2001, January 7–11). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, USA.

3. Ullah, R., Wuttisittikulkij, L., Chaudhary, S., Parnianifard, A., Shah, S., Ibrar, M., and Wahab, F.E. (2022). End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement. Sensors, 22.

4. On training targets for supervised speech separation;Wang;IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP),2014

5. Erdogan, H., Hershey, J.R., Watanabe, S., and Le Roux, J. (2015, January 19–24). Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Deep Learning Based Speech Enhancement on Edge Devices Applied to Assistive Work Equipment;2024 IEEE Sensors Applications Symposium (SAS);2024-07-23

2. A brain-inspired algorithm improves “cocktail party” listening for individuals with hearing loss;2024-05-01

3. Applications of AI-empowered electric vehicles for voice recognition in Asian and Austronesian languages;Artificial Intelligence-Empowered Modern Electric Vehicles in Smart Grid Systems;2024

4. Experimental Investigation of Acoustic Features to Optimize Intelligibility in Cochlear Implants;Sensors;2023-08-31