DFNet: Decoupled Fusion Network for Dialectal Speech Recognition-Reference-Cited by-同舟云学术

DFNet: Decoupled Fusion Network for Dialectal Speech Recognition

Published:2024-06-17 Issue:12 Volume:12 Page:1886
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Zhu Qianqiao¹,Gao Lu¹,Qin Ling¹^ORCID

Affiliation:

1. School of Digtial and Intelligence Industry, Inner Mongolia University of Science and Technology, Baotou 014010, China

Abstract

Deep learning is often inadequate for achieving effective dialect recognition in situations where data are limited and model training is complex. Differences between Mandarin and dialects, such as the varied pronunciation variants and distinct linguistic features of dialects, often result in a significant decline in recognition performance. In addition, existing work often overlooks the similarities between Mandarin and its dialects and fails to leverage these connections to enhance recognition accuracy. To address these challenges, we propose the Decoupled Fusion Network (DFNet). This network extracts acoustic private and shared features of different languages through feature decoupling, which enhances adaptation to the uniqueness and similarity of these two speech patterns. In addition, we designed a heterogeneous information-weighted fusion module to effectively combine the decoupled Mandarin and dialect features. This strategy leverages the similarity between Mandarin and its dialects, enabling the sharing of multilingual information, and notably enhance the model’s recognition capabilities on low-resource dialect data. An evaluation of our method on the Henan and Guangdong datasets shows that the DFNet performance has improved by 2.64% and 2.68%, respectively. Additionally, a significant number of ablation comparison experiments demonstrate the effectiveness of the method.

Funder

National Natural Science Foundation of China

Science and Technology Project of Inner Mongolia Autonomous Region

Publisher

MDPI AG

Link

https://www.mdpi.com/2227-7390/12/12/1886/pdf

Reference48 articles.

1. Effect of gender on improving speech recognition system;Bhukya;Int. J. Comput. Appl.,2018

2. Supervised I-vector modeling for language and accent recognition;Ramoji;Comput. Speech Lang.,2020

3. Spoken language identification using deep learning;Singh;Comput. Intell. Neurosci.,2021

4. Byrne, W., Beyerlein, P., Huerta, J.M., Khudanpur, S., Marthi, B., Morgan, J., Peterek, N., Picone, J., Vergyri, D., and Wang, T. (2000, January 5–9). Towards language independent acoustic modeling. Proceedings of the 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No. 00CH37100), Istanbul, Turkey.

5. Kumar, A., Verma, S., and Mangla, H. (2018, January 12–13). A survey of deep learning techniques in speech recognition. Proceedings of the 2018 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), Greater Noida, India.