A Survey on Low-Latency DNN-Based Speech Enhancement
Affiliation:
1. Institute of Automatic Control and Robotics, Poznan University of Technology, Piotrowo 3A Street, 60-965 Poznan, Poland
Abstract
This paper presents recent advances in low-latency, single-channel, deep neural network-based speech enhancement systems. The sources of latency and their acceptable values in different applications are described. This is followed by an analysis of the constraints imposed on neural network architectures. Specifically, the causal units used in deep neural networks are presented and discussed in the context of their properties, such as the number of parameters, the receptive field, and computational complexity. This is followed by a discussion of techniques used to reduce the computational complexity and memory requirements of the neural networks used in this task. Finally, the techniques used by the winners of the latest speech enhancement challenges (DNS, Clarity) are shown and compared.
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference117 articles.
1. Taal, C.H., Hendriks, R.C., Heusdens, R., and Jensen, J. (2010, January 14–19). A short-time objective intelligibility measure for time-frequency weighted noisy speech. Proceedings of the 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, TX, USA. 2. Rix, A.W., Beerends, J.G., Hollier, M.P., and Hekstra, A.P. (2001, January 7–11). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, USA. 3. Ullah, R., Wuttisittikulkij, L., Chaudhary, S., Parnianifard, A., Shah, S., Ibrar, M., and Wahab, F.E. (2022). End-to-End Deep Convolutional Recurrent Models for Noise Robust Waveform Speech Enhancement. Sensors, 22. 4. On training targets for supervised speech separation;Wang;IEEE/ACM Trans. Audio Speech Lang. Process. (TASLP),2014 5. Erdogan, H., Hershey, J.R., Watanabe, S., and Le Roux, J. (2015, January 19–24). Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks. Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|