A Method Improves Speech Recognition with Contrastive Learning in Low-Resource Languages-Reference-Cited by-同舟云学术

A Method Improves Speech Recognition with Contrastive Learning in Low-Resource Languages

Published:2023-04-12 Issue:8 Volume:13 Page:4836
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Sun Lixu¹²^ORCID,Yolwas Nurmemet¹²,Jiang Lina¹²

Affiliation:

1. Xinjiang Multilingual Information Technology Laboratory, Urumqi 830017, China

2. College of Information Science and Engineering, Xinjiang University, Urumqi 830017, China

Abstract

Building an effective automatic speech recognition system typically requires a large amount of high-quality labeled data; However, this can be challenging for low-resource languages. Currently, self-supervised contrastive learning has shown promising results in low-resource automatic speech recognition, but there is no discussion on the quality of negative sample sets in speech contrastive learning. In this paper, we propose the false negatives impact elimination (FNIE) method to filter false negative samples and improve the quality of the negative sample set in speech. FNIE compares the support vector with the negative sample vector set and optimizes the corresponding loss function, allowing the model to learn better speech representations and achieve superior results in low-resource speech recognition. Experiments demonstrate that FNIE effectively filters negative samples, enhances the quality of the negative sample set, and improves the accuracy of speech recognition. The quality of the negative sample set significantly affects the model’s learning ability, and using too many negative samples can deteriorate it. In a low-resource setting, our FNIE method achieved a relative improvement of 2.98% in WER on the English dataset, 14.3% in WER on the Uyghur dataset, and 4.04% in CER on the Mandarin dataset compared to the baseline model.

Funder

National Natural Science Foundation of China

National Language Commission key Project

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/8/4836/pdf

Reference41 articles.

1. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., and Schwarz, P. (2011, January 11–15). The Kaldi speech recognition toolkit. Proceedings of the IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, Waikoloa, HI, USA.

2. Li, J., Ye, G., Das, A., Zhao, R., and Gong, Y. (2018, January 15–20). Advancing acoustic-to-word CTC model. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.

3. Karita, S., Chen, N., Hayashi, T., Hori, T., Inaguma, H., Jiang, Z., Someki, M., Soplin, N.E.Y., Yamamoto, R., and Wang, X. (2019, January 14–18). A comparative study on transformer vs rnn in speech applications. Proceedings of the 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Singapore.

4. Nakatani, T. (2019, January 15–19). Improving transformer-based end-to-end speech recognition with connectionist temporal classification and language model integration. Proceedings of the Proceedings Interspeech, Graz, Austria.

5. Dong, L., Xu, S., and Xu, B. (2018, January 15–20). Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition. Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Self-paced contrastive learning for knowledge tracing;Neurocomputing;2024-12

2. Helicopter cockpit speech recognition method based on transfer learning and context biasing;Engineering Research Express;2024-08-14

3. Shared Knowledge-Based Contrastive Federated Learning for Partial Discharge Diagnosis in Gas-Insulated Switchgear;IEEE Access;2024

4. DABaCLT: A Data Augmentation Bias-Aware Contrastive Learning Framework for Time Series Representation;Applied Sciences;2023-07-06