Exploring task-diverse meta-learning on Tibetan multi-dialect speech recognition-Reference-Cited by-同舟云学术

Exploring task-diverse meta-learning on Tibetan multi-dialect speech recognition

Published:2024-07-17 Issue:1 Volume:2024 Page:
ISSN:1687-4722
Container-title:EURASIP Journal on Audio, Speech, and Music Processing
language:en
Short-container-title:J AUDIO SPEECH MUSIC PROC.

Author:

Liu Yigang,Zhao Yue^ORCID,Xu Xiaona,Xu Liang,Zhang Xubei,Ji Qiang

Abstract

AbstractThe disparities in phonetics and corpuses across the three major dialects of Tibetan exacerbate the difficulty of a single task model for one dialect to accommodate other different dialects. To address this issue, this paper proposes task-diverse meta-learning. Our model can acquire more comprehensive and robust features, facilitating its adaptation to the variations among different dialects. This study uses Tibetan dialect ID recognition and Tibetan speaker recognition as the source tasks for meta-learning, which aims to augment the ability of the model to discriminate variations and differences among different dialects. Consequently, the model’s performance in Tibetan multi-dialect speech recognition tasks is enhanced. The experimental results show that task-diverse meta-learning leads to improved performance in Tibetan multi-dialect speech recognition. This demonstrates the effectiveness and applicability of task-diverse meta-learning, thereby contributing to the advancement of speech recognition techniques in multi-dialect environments.

Funder

National Natural Science Foundation of China

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1186/s13636-024-00361-7.pdf

Reference26 articles.

1. N. Zhou, Research on Tibetan non-specific person continuous speech recognition based on deep learning. Master’s thesis, Central University for Nationalities (2017)

2. X. Huang, J. Li, Acoustic model for Tibetan speech recognition based on recurrent neural network. J. Chin. Inf. 32(5), 189–191 (2018)

3. Q. Wang, W. Guo, C. Xie, Tibetan speech recognition based on end-to-end technology. Pattern Recognit. Artif. Intell. 30(4), 359–363 (2017)

4. S. Yuan, W. Guo, L. Dai, Tibetan language recognition based on deep neural networks. Pattern Recognit. Artif. Intell. 28(3), 209–213 (2015)

5. S. Min, M. Lewis, L. Zettlemoyer et al., Metaicl: Learning to learn in context[J]. arXiv preprint arXiv:2110.15943 (2021)