MSFL: Explainable Multitask-Based Shared Feature Learning for Multilingual Speech Emotion Recognition-Reference-Cited by-同舟云学术

MSFL: Explainable Multitask-Based Shared Feature Learning for Multilingual Speech Emotion Recognition

Published:2022-12-13 Issue:24 Volume:12 Page:12805
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Ma Yiping^ORCID,Wang Wei

Abstract

Speech emotion recognition (SER), a rapidly evolving task that aims to recognize the emotion of speakers, has become a key research area in affective computing. However, various languages in multilingual natural scenarios extremely challenge the generalization ability of SER, causing the model performance to decrease quickly, and driving researchers to ask how to improve the performance of multilingual SER. Recent studies mainly use feature fusion and language-controlled models to address this challenge, but key points such as the intrinsic association of languages or deep analysis of multilingual shared features (MSFs) are still neglected. To solve this problem, an explainable Multitask-based Shared Feature Learning (MSFL) model is proposed for multilingual SER. The introduction of multi-task learning (MTL) can provide related task information of language recognition for MSFL, improve its generalization in multilingual situations, and further lay the foundation for learning MSFs. Specifically, considering the generalization capability and interpretability of the model, the powerful MTL module was combined with the long short-term memory and attention mechanism, aiming to maintain the generalization in multilingual situations. Then, the feature weights acquired from the attention mechanism were ranked in descending order, and the top-ranked MSFs were compared with top-ranked monolingual features, enhancing the model interpretability based on the feature comparison. Various experiments were conducted on Emo-DB, CASIA, and SAVEE corpora from the model generalization and interpretability aspects. Experimental results indicate that MSFL performs better than most state-of-the-art models, with an average improvement of 3.37–4.49%. Besides, the top 10 features in MSFs almost contain the top-ranked features in three monolingual features, which effectively demonstrates the interpretability of MSFL.

Funder

Chinese National Social Science Foundation

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/24/12805/pdf

Reference77 articles.

1. Dellaert, F., Polzin, T., and Waibel, A. (1996, January 3–6). Recognizing Emotion in Speech. Proceedings of the Fourth International Conference on Spoken Language Processing, ICSLP ’96, Philadelphia, PA, USA.

2. Classifying Emotions and Engagement in Online Learning Based on a Single Facial Expression Recognition Neural Network;Savchenko;IEEE Trans. Affect. Comput.,2022

3. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer;Raffel;J. Mach. Learn. Research.,2022

4. EEG-Based Emotion Recognition Using Regularized Graph Neural Networks;Zhong;IEEE Trans. Affect. Comput.,2022

5. Dimensional Speech Emotion Recognition Review;Li;Ruan Jian Xue Bao/J. Softw.,2020

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Feature-Enhanced Multi-Task Learning for Speech Emotion Recognition Using Decision Trees and LSTM;Electronics;2024-07-10

2. Multi-language: ensemble learning-based speech emotion recognition;International Journal of Data Science and Analytics;2024-05-07

3. The Use of Multi-Feature Fusion in the Evaluation of Emotional Expressions in Spoken English;Applied Mathematics and Nonlinear Sciences;2024-01-01