Multi-Scale Feature Learning for Language Identification of Overlapped Speech-Reference-Cited by-同舟云学术

Multi-Scale Feature Learning for Language Identification of Overlapped Speech

Published:2023-03-27 Issue:7 Volume:13 Page:4235
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Aysa Zuhragvl¹,Ablimit Mijit¹,Hamdulla Askar¹^ORCID

Affiliation:

1. College of Information Science and Engineering, Xinjiang University, Urumqi 830017, China

Abstract

Language identification is the front end of multilingual speech-processing tasks. The study aims to enhance the accuracy of language identification in complex acoustic environments by proposing a multi-scale feature extraction method. This method replaces the baseline feature extraction network with a multi-scale feature extraction network (SE-Res2Net-CBAM-BILSTM) to extract multi-scale features. A multilingual cocktail party dataset was simulated, and comparative experiments were conducted with various models. The experimental results show that the proposed model achieved language identification accuracies of 97.6% for an Oriental language dataset and 75% for a multilingual cocktail party dataset Furthermore, comparative experiments show that our model outperformed three other models in the accuracy, recall, and F1 values. Finally, a comparison of different loss functions shows that the model performance was better when using focal loss.

Funder

Strengthening Plan of the National Defense Science and Technology Foundation of China

Natural Science Foundation of China

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/7/4235/pdf

Reference36 articles.

1. Hazen, T.J., and Zue, V.W. (1994, January 18–22). Recent improvements in an approach to segment-based automatic language identification. Proceedings of the 3rd International Conference on Spoken Language Processing (ICSLP 1994), Yokohama, Japan.

2. Spoken language identification-a step toward multilinguality in speech processing;Navratil;IEEE Trans. Speech Audio Process.,2001

3. Wavlm: Large-scale self-supervised pre-training for full stack speech processing;Chen;IEEE J. Sel. Top. Signal Process.,2022

4. Wong, E. (2004). Automatic Spoken Language Identification Utilizing Acoustic and Phonetic Speech Information. [Ph.D. Thesis, Queensland University of Technology].

5. Lopez-Moreno, I., Gonzalez-Dominguez, J., Plchot, O., Martinez, D., Gonzalez-Rodriguez, J., and Moreno, P. (2014, January 4–9). Automatic language identification using deep neural networks. Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy.

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Estimation of Muscle Forces of Lower Limbs Based on CNN–LSTM Neural Network and Wearable Sensor System;Sensors;2024-02-05

2. Methods for processing and analyzing passive acoustic monitoring data: An example of song recognition in western black-crested gibbons;Ecological Indicators;2023-11

3. An Investigation of ECAPA-TDNN Audio Type Recognition Method Based on Mel Acoustic Spectrograms;Electronics;2023-10-27

4. Speaker Verification Based on Single Channel Speech Separation;IEEE Access;2023