A Method Improves Speech Recognition with Contrastive Learning in Low-Resource Languages
-
Published:2023-04-12
Issue:8
Volume:13
Page:4836
-
ISSN:2076-3417
-
Container-title:Applied Sciences
-
language:en
-
Short-container-title:Applied Sciences
Author:
Sun Lixu12ORCID, Yolwas Nurmemet12, Jiang Lina12
Affiliation:
1. Xinjiang Multilingual Information Technology Laboratory, Urumqi 830017, China 2. College of Information Science and Engineering, Xinjiang University, Urumqi 830017, China
Abstract
Building an effective automatic speech recognition system typically requires a large amount of high-quality labeled data; However, this can be challenging for low-resource languages. Currently, self-supervised contrastive learning has shown promising results in low-resource automatic speech recognition, but there is no discussion on the quality of negative sample sets in speech contrastive learning. In this paper, we propose the false negatives impact elimination (FNIE) method to filter false negative samples and improve the quality of the negative sample set in speech. FNIE compares the support vector with the negative sample vector set and optimizes the corresponding loss function, allowing the model to learn better speech representations and achieve superior results in low-resource speech recognition. Experiments demonstrate that FNIE effectively filters negative samples, enhances the quality of the negative sample set, and improves the accuracy of speech recognition. The quality of the negative sample set significantly affects the model’s learning ability, and using too many negative samples can deteriorate it. In a low-resource setting, our FNIE method achieved a relative improvement of 2.98% in WER on the English dataset, 14.3% in WER on the Uyghur dataset, and 4.04% in CER on the Mandarin dataset compared to the baseline model.
Funder
National Natural Science Foundation of China National Language Commission key Project
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference41 articles.
1. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., and Schwarz, P. (2011, January 11–15). The Kaldi speech recognition toolkit. Proceedings of the IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, Waikoloa, HI, USA. 2. Li, J., Ye, G., Das, A., Zhao, R., and Gong, Y. (2018, January 15–20). Advancing acoustic-to-word CTC model. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada. 3. Karita, S., Chen, N., Hayashi, T., Hori, T., Inaguma, H., Jiang, Z., Someki, M., Soplin, N.E.Y., Yamamoto, R., and Wang, X. (2019, January 14–18). A comparative study on transformer vs rnn in speech applications. Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Singapore. 4. Nakatani, T. (2019, January 15–19). Improving transformer-based end-to-end speech recognition with connectionist temporal classification and language model integration. Proceedings of the Proceedings Interspeech, Graz, Austria. 5. Dong, L., Xu, S., and Xu, B. (2018, January 15–20). Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|