Multimodal and Multitask Learning with Additive Angular Penalty Focus Loss for Speech Emotion Recognition-Reference-Cited by-同舟云学术

Multimodal and Multitask Learning with Additive Angular Penalty Focus Loss for Speech Emotion Recognition

Published:2023-10-17 Issue: Volume:2023 Page:1-13
ISSN:1098-111X
Container-title:International Journal of Intelligent Systems
language:en
Short-container-title:International Journal of Intelligent Systems

Author:

Wen Guihua¹^ORCID,Ye Sheng¹^ORCID,Li Huihui²^ORCID,Wen Pengcheng¹^ORCID,Zhang Yuhan³^ORCID

Affiliation:

1. School of Computer Science and Engineering, South China University of Technology, Guangzhou, China

2. School of Computer Science, Guangdong Polytechnic Normal University, Guangzhou, China

3. Department of Neurology, Dongguan Songshanhu Central Hospital, Dongguan, China

Abstract

Speech emotion recognition has lots of applications such as human-computer interaction and health management. The current methods are challenged with the problems of fuzzy decision boundary and imbalance between difficult and easy samples in the training data. This paper first proposes an additive angle penalty focus loss function (APFL), which strictly refines the fuzzy decision boundary by introducing angle penalty factors to improve the compactness within the class and enlarge the distance between classes. It also assigns the larger loss to difficult samples to make the model pay more attention to them, as they are easily misclassified. Simultaneously, due to the lack of training samples, the framework of multimodal and multitask learning with APFL is further proposed, which extracts spectrogram features by deep neural network, text features by the pretrained language model, and audio features by the pretrained sound model. It uses the gender recognition as an auxiliary task. The experimental results verify the effectiveness of the proposed loss function and framework.

Funder

National Natural Science Foundation of China

Publisher

Hindawi Limited

Subject

Artificial Intelligence,Human-Computer Interaction,Theoretical Computer Science,Software

Link

http://downloads.hindawi.com/journals/ijis/2023/3662839.pdf

Reference65 articles.

1. Multi-Classifier Interactive Learning for Ambiguous Speech Emotion Recognition

2. Self-labeling with feature transfer for speech emotion recognition

3. CTL-MTNet: A novel CapsNet and transfer learning-based mixed task net for the single-corpus and cross-corpus speech emotion recognition;X.-C. Wen

4. Automatic speech emotion recognition using an optimal combination of features based on EMD-TKEO

5. Revisiting hidden Markov models for speech emotion recognition;S. Mao