Domain Adaptation with Augmented Data by Deep Neural Network Based Method Using Re-Recorded Speech for Automatic Speech Recognition in Real Environment-Reference-Cited by-同舟云学术

Domain Adaptation with Augmented Data by Deep Neural Network Based Method Using Re-Recorded Speech for Automatic Speech Recognition in Real Environment

Published:2022-12-16 Issue:24 Volume:22 Page:9945
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Nahar Raufun,Miwa Shogo,Kai Atsuhiko

Abstract

The most effective automatic speech recognition (ASR) approaches are based on artificial neural networks (ANN). ANNs need to be trained with an adequate amount of matched conditioned data. Therefore, performing training adaptation of an ASR model using augmented data of matched condition as the real environment gives better results for real data. Real-world speech recordings can vary in different acoustic aspects depending on the recording channels and environment such as the Long Term Evolution (LTE) channel of mobile telephones, where data are transmitted with voice over LTE (VoLTE) technology, wireless pin mics in a classroom condition, etc. Acquiring data with such variation is costly. Therefore, we propose training ASR models with simulated augmented data and fine-tune them for domain adaptation using deep neural network (DNN)-based simulated data along with re-recorded data. DNN-based feature transformation creates realistic speech features from recordings of clean conditions. In this research, a comparative investigation is performed for different recording channel adaptation methods for real-world speech recognition. The proposed method yields 27.0% character error rate reduction (CERR) for the DNN–hidden Markov model (DNN-HMM) hybrid ASR approach and 36.4% CERR for the end-to-end ASR approach for the target domain of the LTE channel of telephone speech.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/22/24/9945/pdf

Reference31 articles.

1. Ko, T., Peddinti, V., Povey, D., and Khudanpur, S. (2015, January 6–10). Audio augmentation for speech recognition. Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association, Dresden, Germany.

2. Hsiao, R., Ma, J., Hartmann, W., Karafiát, M., František, G., Burget, L., Szöke, I., Černocký, J.H., Watanabe, S., and Chen, Z. (2015, January 13–17). Robust speech recognition in unknown reverberant and noisy conditions. Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), Scottsdale, AZ, USA.

3. Ko, T., Peddinti, V., Povey, D., Seltzer, M.L., and Khudanpur, S. (2017, January 5–9). A study on data augmentation of reverberant speech for robust speech recognition. Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA.

4. Data augmentation for deep neural network acoustic modeling;Cui;IEEE/ACM Trans. Audio Speech Lang. Process.,2015

5. Khokhlov, Y., Zatvornitskiy, A., Medennikov, I., Sorokin, I., Prisyach, T., Romanenko, A., Mitrofanov, A., Bataev, V., Andrusenko, A., and Korenevskaya, M. (2019, January 15–19). R-vectors: New Technique for Adaptation to Room Acoustics. Proceedings of the INTERSPEECH, Graz, Austria.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. ChildAugment: Data augmentation methods for zero-resource children's speaker verification;The Journal of the Acoustical Society of America;2024-03-01

2. Low Voice Speech Conversion Analysis Using Novel Convolutional Neural Network Compared with K-Nearest Neighbor with Enhanced Accuracy;Lecture Notes in Networks and Systems;2024