Multi-Task Transformer with Adaptive Cross-Entropy Loss for Multi-Dialect Speech Recognition-Reference-Cited by-同舟云学术

Multi-Task Transformer with Adaptive Cross-Entropy Loss for Multi-Dialect Speech Recognition

Published:2022-10-08 Issue:10 Volume:24 Page:1429
ISSN:1099-4300
Container-title:Entropy
language:en
Short-container-title:Entropy

Author:

Dan Zhengjia^ORCID,Zhao Yue^ORCID,Bi Xiaojun,Wu Licheng^ORCID,Ji Qiang

Abstract

At present, most multi-dialect speech recognition models are based on a hard-parameter-sharing multi-task structure, which makes it difficult to reveal how one task contributes to others. In addition, in order to balance multi-task learning, the weights of the multi-task objective function need to be manually adjusted. This makes multi-task learning very difficult and costly because it requires constantly trying various combinations of weights to determine the optimal task weights. In this paper, we propose a multi-dialect acoustic model that combines soft-parameter-sharing multi-task learning with Transformer, and introduce several auxiliary cross-attentions to enable the auxiliary task (dialect ID recognition) to provide dialect information for the multi-dialect speech recognition task. Furthermore, we use the adaptive cross-entropy loss function as the multi-task objective function, which automatically balances the learning of the multi-task model according to the loss proportion of each task during the training process. Therefore, the optimal weight combination can be found without any manual intervention. Finally, for the two tasks of multi-dialect (including low-resource dialect) speech recognition and dialect ID recognition, the experimental results show that, compared with single-dialect Transformer, single-task multi-dialect Transformer, and multi-task Transformer with hard parameter sharing, our method significantly reduces the average syllable error rate of Tibetan multi-dialect speech recognition and the character error rate of Chinese multi-dialect speech recognition.

Publisher

MDPI AG

Subject

General Physics and Astronomy

Link

https://www.mdpi.com/1099-4300/24/10/1429/pdf

Reference33 articles.

1. Streaming automatic speech recognition with the transformer model;Moritz;Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2020

2. Quartznet: Deep automatic speech recognition with 1d time-channel separable convolutions;Kriman;Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2020

3. End-to-end asr: From supervised to semi-supervised learning with modern architectures;Synnaeve;arXiv,2019

4. Recent progresses in deep learning based acoustic models

5. Multichannel Signal Processing With Deep Neural Networks for Automatic Speech Recognition

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Automatic speech recognition using advanced deep learning approaches: A survey;Information Fusion;2024-09

2. Exploring the frontier: Transformer-based models in EEG signal analysis for brain-computer interfaces;Computers in Biology and Medicine;2024-08

3. Chinese dialect speech recognition: a comprehensive survey;Artificial Intelligence Review;2024-01-31

4. Improving Prediction of Chronic Kidney Disease Using KNN Imputed SMOTE Features and TrioNet Model;Computer Modeling in Engineering & Sciences;2024

5. An Enterprise Service Demand Classification Method Based on One-Dimensional Convolutional Neural Network with Cross-Entropy Loss and Enterprise Portrait;Entropy;2023-08-14