Feature-Enhanced Multi-Task Learning for Speech Emotion Recognition Using Decision Trees and LSTM-Reference-Cited by-同舟云学术

Feature-Enhanced Multi-Task Learning for Speech Emotion Recognition Using Decision Trees and LSTM

Published:2024-07-10 Issue:14 Volume:13 Page:2689
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Wang Chun¹,Shen Xizhong¹^ORCID

Affiliation:

1. College of Electrical & Electronic Engineering, Shanghai Institute of Technology, Shanghai 201418, China

Abstract

Speech emotion recognition (SER) plays an important role in human-computer interaction (HCI) technology and has a wide range of application scenarios in medical medicine, psychotherapy, and other applications. In recent years, with the development of deep learning, many researchers have combined feature extraction technology with deep learning technology to extract more discriminative emotional information. However, a single speech emotion classification task makes it difficult to effectively utilize feature information, resulting in feature redundancy. Therefore, this paper uses speech feature enhancement (SFE) as an auxiliary task to provide additional information for the SER task. This paper combines Long Short-Term Memory Networks (LSTM) with soft decision trees and proposes a multi-task learning framework based on a decision tree structure. Specifically, it trains the LSTM network by computing the distances of features at different leaf nodes in the soft decision tree, thereby achieving enhanced speech feature representation. The results show that the algorithm achieves 85.6% accuracy on the EMO-DB dataset and 81.3% accuracy on the CASIA dataset. This represents an improvement of 11.8% over the baseline on the EMO-DB dataset and 14.9% on the CASIA dataset, proving the effectiveness of the method. Additionally, we conducted cross-database experiments, real-time performance analysis, and noise environment analysis to validate the robustness and practicality of our method. The additional analyses further demonstrate that our approach performs reliably across different databases, maintains real-time processing capabilities, and is robust to noisy environments.

Publisher

MDPI AG

Link

https://www.mdpi.com/2079-9292/13/14/2689/pdf

Reference33 articles.

1. Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers;Speech Commun.,2020

2. Research on the Application of Speech Database based on Emotional Feature Extraction in International Chinese Education and Teaching;Zhang;Scalable Comput. Pract. Exp.,2024

3. Bojanić, M., Delić, V., and Karpov, A. (2020). Call redistribution for a call center based on speech emotion recognition. Appl. Sci., 10.

4. Zhou, M. (2023). Research on Design of Museum Cultural and Creative Products Based on Speech Emotion Recognition. [Master’s Thesis, Jiangnan University]. (In Chinese).

5. Ullah, R., Asif, M., Shah, W.A., Anjam, F., Ullah, I., Khurshaid, T., Wuttisittikulkij, L., Shah, S., Ali, S.M., and Alibakhshikenari, M. (2023). Speech emotion recognition using convolution neural networks and multi-head convolutional transformer. Sensors, 23.