Deep neural networks for automatic speech processing: a survey from large corpora to limited data-Reference-Cited by-同舟云学术

Deep neural networks for automatic speech processing: a survey from large corpora to limited data

Published:2022-08-17 Issue:1 Volume:2022 Page:
ISSN:1687-4722
Container-title:EURASIP Journal on Audio, Speech, and Music Processing
language:en
Short-container-title:J AUDIO SPEECH MUSIC PROC.

Author:

Roger Vincent,Farinas Jérôme,Pinquier Julien

Abstract

AbstractMost state-of-the-art speech systems use deep neural networks (DNNs). These systems require a large amount of data to be learned. Hence, training state-of-the-art frameworks on under-resourced speech challenges are difficult tasks. As an example, a challenge could be the limited amount of data to model impaired speech. Furthermore, acquiring more data and/or expertise is time-consuming and expensive. In this paper, we focus on the following speech processing tasks: automatic speech recognition, speaker identification, and emotion recognition. To assess the problem of limited data, we firstly investigate state-of-the-art automatic speech recognition systems, as this is the hardest task (due to the wide variability in each language). Next, we provide an overview of techniques and tasks requiring fewer data. In the last section, we investigate few-shot techniques by interpreting under-resourced speech as a few-shot problem. In that sense, we propose an overview of few-shot techniques and the possibility of using such techniques for the speech problems addressed in this survey. It is true that the reviewed techniques are not well adapted for large datasets. Nevertheless, some promising results from the literature encourage the usage of such techniques for speech processing.

Publisher

Springer Science and Business Media LLC

Subject

Electrical and Electronic Engineering,Acoustics and Ultrasonics

Link

https://link.springer.com/content/pdf/10.1186/s13636-022-00251-w.pdf

Reference61 articles.

1. P. Sahu, M. Dua, A. Kumar, in Speech and Language Processing for Human-machine Communications. Challenges and issues in adopting speech recognition, (2018), pp. 209–215. https://doi.org/10.1007/978-981-10-6626-9_23.

2. J. Barker, S. Watanabe, E. Vincent, J. Trmal, in Interspeech 2018. The Fifth ‘CHiME’ Speech Separation and Recognition Challenge: Dataset, Task and Baselines, (2018), pp. 1561–1565. https://doi.org/10.21437/Interspeech.2018-1768.

3. F. Hernandez, V. Nguyen, S. Ghannay, N. Tomashenko, Y. Estève, in Speech and Computer - 20th International Conference, vol. 11096. TED-LIUM 3: Twice as much data and corpus repartition for experiments on speaker adaptation, (2018), pp. 198–208. https://doi.org/10.1007/978-3-319-99579-3_21.

4. V. Panayotov, G. Chen, D. Povey, S. Khudanpur, in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Librispeech: An ASR corpus based on public domain audio books (IEEESouth Brisbane, 2015), pp. 5206–5210. https://doi.org/10.1109/ICASSP.2015.7178964.

5. J. S. Chung, A. Nagrani, A. Zisserman, in Interspeech 2018. VoxCeleb2: Deep Speaker Recognition, (2018), pp. 1086–1090. https://doi.org/10.21437/Interspeech.2018-1929.

Cited by 8 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. CoRePooL—Corpus for Resource‐Poor Languages;Automatic Speech Recognition and Translation for Low Resource Languages;2024-03-29

2. Attention Feature Fusion Network via Knowledge Propagation for Automated Respiratory Sound Classification;IEEE Open Journal of Engineering in Medicine and Biology;2024

3. Modeling speech processing in case of neurogenic speech and language disorders: neural dysfunctions, brain lesions, and speech behavior;Frontiers in Language Sciences;2023-10-09

4. Lip2Vec: Efficient and Robust Visual Speech Recognition via Latent-to-Latent Visual to Audio Representation Mapping;2023 IEEE/CVF International Conference on Computer Vision (ICCV);2023-10-01

5. An optimized enhanced-multi learner approach towards speaker identification based on single-sound segments;Multimedia Tools and Applications;2023-08-17