Speech enhancement augmentation for robust speech recognition in noisy environments-Reference-Cited by-同舟云学术

Speech enhancement augmentation for robust speech recognition in noisy environments

Published:2024 Issue: Volume:59 Page:04003
ISSN:2271-2097
Container-title:ITM Web of Conferences
language:
Short-container-title:ITM Web Conf.

Author:

Nasretdinov Rauf,Lependin Andrey,Ilyashenko Ilya

Abstract

Abstract. The use of augmentations as a data enrichment method has become an important element in improving the performance of speech recognition systems. To work effectively in noisy conditions, augmentation is usually used to simulate the presence of background noise. However, the quality of speech recognition on samples pre-processed by noise reduction models does not increase. This paper proposes a new approach to speech data augmentation when training ASR systems, intended for their joint use with models for speech enhancement. It was based on the creation of several additional data samples containing speech samples processed by the enhancement model. The proposed approach was tested on the E-Branchformer neural network model using data from the Librispeech set. The quality of speech samples was assessed using the DNSMOS metric. By means of a 100-hour sample of clean speech samples it was shown that the proposed augmentation allows for an improvement in the WER metric of more than 4% in absolute value compared to the generally accepted approach based on adding noisy speech samples. Experiments on 960-hour data demonstrated the robustness of this approach as the training set size increased.

Publisher

EDP Sciences

Link

https://www.itm-conferences.org/10.1051/itmconf/20245904003/pdf

Reference26 articles.

1. Jaitly N., Hinton G. E., Vocal tract length perturbation (VTLP) improves speech recognition, in Proceedings of the International Conference on Machine Learning, ICML, Workshop on Deep Learning for Audio, Speech, and Language Processing, 2021 June 2013, Atlanta, USA (2013)

2. Ko T., Peddinti V., Povey D., Khudanpur S., Audio Augmentation for Speech Recognition, in Proceedings of the Interspeech, 6-10 September 2015, Dresden, Germany (2015)

3. Park D. S., Chan W., Zhang Y., Chiu C., Zoph B., Cubuk E. D., Specaugment: A simple data augmentation method for automatic speech recognition, in Proceedings of the Interspeech, 15-19 September 2019, Graz, Austria (2019)

4. Panayotov V., Chen G., Povey D., Khudanpur S., LibriSpeech: An ASR corpus based on public domain audio books, in Proceedings of the International Conference on Acoustics, Speech and Signal Processing, ICASSP, 19-24 April 2015, Brisbane, Queensland (2015)

5. Rosenberg A., Zhang Y., Ramabhadran B., Jia Y., Moreno P., Wu Y., Wu Z., Speech recognition with augmented synthesized speech, in Proceedings of the IEEE automatic speech recognition and understanding workshop, ASRU, 14-18 December 2019, Sentosa, Singapore (2019)