Building a Speech Dataset and Recognition Model for the Minority Tu Language-Reference-Cited by-同舟云学术

Building a Speech Dataset and Recognition Model for the Minority Tu Language

Published:2024-08-04 Issue:15 Volume:14 Page:6795
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Kong Shasha¹^ORCID,Li Chunmei¹^ORCID,Fang Chengwu¹^ORCID,Yang Peng¹

Affiliation:

1. College of Computer Technology and Applications, Qinghai University, Xining 810016, China

Abstract

Speech recognition technology has many applications in our daily life. However, for many low-resource languages without written forms, acquiring sufficient training data remains a significant challenge for building accurate ASR models. The Tu language, spoken by an ethnic minority group in Qinghai Province in China, is one such example. Due to the lack of written records and the great diversity in regional pronunciations, there has been little previous research on Tu-language speech recognition. This work seeks to address this research gap by creating the first speech dataset for the Tu language spoken in Huzhu County, Qinghai. We first formulated the relevant pronunciation rules for the Tu language based on linguistic analysis. Then, we constructed a new speech corpus, named HZ-TuDs, through targeted data collection and annotation. Based on the HZ-TuDs dataset, we designed several baseline sequence-to-sequence deep neural models for end-to-end Tu-language speech recognition. Additionally, we proposed a novel SA-conformer model, which combines convolutional and channel attention modules to better extract speech features. Experiments showed that our proposed SA-conformer model can significantly reduce the character error rate from 23% to 12%, effectively improving the accuracy of Tu language recognition compared to previous approaches. This demonstrates the effectiveness of our dataset construction and model design efforts in advancing speech recognition technology for this low-resource minority language.

Funder

National Natural Science Foundation of China

Basic Research Project of Qinghai Province, China

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/15/6795/pdf

Reference26 articles.

1. Slam, W., Li, Y., and Urouvas, N. (2023). Frontier Research on Low-Resource Speech Recognition Technology. Sensors, 23.

2. Six decades of ethnic minority population change in China;Wu;Asian Popul. Stud.,2019

3. Xiaoling, Y. (2020). A Comparative Study on the Vocabulary of Festivals in Minhe Huzhu Tu Language. Educ. Res., 2630–4686.

4. Junast (1964). Overview of the Tu language. Stud. Chin. Lang.

5. A survey of the use and language attitudes of the Tu people;Genxiong;Chin. Mong. Stud. (Mong.),2011