Affiliation:
1. College of Information Science and Engineering, Xinjiang University, Urumqi 830017, China
Abstract
Language identification is the front end of multilingual speech-processing tasks. The study aims to enhance the accuracy of language identification in complex acoustic environments by proposing a multi-scale feature extraction method. This method replaces the baseline feature extraction network with a multi-scale feature extraction network (SE-Res2Net-CBAM-BILSTM) to extract multi-scale features. A multilingual cocktail party dataset was simulated, and comparative experiments were conducted with various models. The experimental results show that the proposed model achieved language identification accuracies of 97.6% for an Oriental language dataset and 75% for a multilingual cocktail party dataset Furthermore, comparative experiments show that our model outperformed three other models in the accuracy, recall, and F1 values. Finally, a comparison of different loss functions shows that the model performance was better when using focal loss.
Funder
Strengthening Plan of the National Defense Science and Technology Foundation of China
Natural Science Foundation of China
Subject
Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science
Reference36 articles.
1. Hazen, T.J., and Zue, V.W. (1994, January 18–22). Recent improvements in an approach to segment-based automatic language identification. Proceedings of the 3rd International Conference on Spoken Language Processing (ICSLP 1994), Yokohama, Japan.
2. Spoken language identification-a step toward multilinguality in speech processing;Navratil;IEEE Trans. Speech Audio Process.,2001
3. Wavlm: Large-scale self-supervised pre-training for full stack speech processing;Chen;IEEE J. Sel. Top. Signal Process.,2022
4. Wong, E. (2004). Automatic Spoken Language Identification Utilizing Acoustic and Phonetic Speech Information. [Ph.D. Thesis, Queensland University of Technology].
5. Lopez-Moreno, I., Gonzalez-Dominguez, J., Plchot, O., Martinez, D., Gonzalez-Rodriguez, J., and Moreno, P. (2014, January 4–9). Automatic language identification using deep neural networks. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献