Multimodal Affective Communication Analysis: Fusing Speech Emotion and Text Sentiment Using Machine Learning-Reference-Cited by-同舟云学术

Multimodal Affective Communication Analysis: Fusing Speech Emotion and Text Sentiment Using Machine Learning

Published:2024-07-29 Issue:15 Volume:14 Page:6631
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Resende Faria Diego¹^ORCID,Weinberg Abraham Itzhak²^ORCID,Ayrosa Pedro Paulo³^ORCID

Affiliation:

1. School of Physics, Engineering and Computer Science, University of Hertfordshire, Hertfordshire AL10 9AB, UK

2. AI-Weinberg AI Experts, Tel Aviv 90850, Israel

3. LABTED and Computer Science Department, State University of Londrina, Londrina 86057-970, Brazil

Abstract

Affective communication, encompassing verbal and non-verbal cues, is crucial for understanding human interactions. This study introduces a novel framework for enhancing emotional understanding by fusing speech emotion recognition (SER) and sentiment analysis (SA). We leverage diverse features and both classical and deep learning models, including Gaussian naive Bayes (GNB), support vector machines (SVMs), random forests (RFs), multilayer perceptron (MLP), and a 1D convolutional neural network (1D-CNN), to accurately discern and categorize emotions in speech. We further extract text sentiment from speech-to-text conversion, analyzing it using pre-trained models like bidirectional encoder representations from transformers (BERT), generative pre-trained transformer 2 (GPT-2), and logistic regression (LR). To improve individual model performance for both SER and SA, we employ an extended dynamic Bayesian mixture model (DBMM) ensemble classifier. Our most significant contribution is the development of a novel two-layered DBMM (2L-DBMM) for multimodal fusion. This model effectively integrates speech emotion and text sentiment, enabling the classification of more nuanced, second-level emotional states. Evaluating our framework on the EmoUERJ (Portuguese) and ESD (English) datasets, the extended DBMM achieves accuracy rates of 96% and 98% for SER, 85% and 95% for SA, and 96% and 98% for combined emotion classification using the 2L-DBMM, respectively. Our findings demonstrate the superior performance of the extended DBMM for individual modalities compared to individual classifiers and the 2L-DBMM for merging different modalities, highlighting the value of ensemble methods and multimodal fusion in affective communication analysis. The results underscore the potential of our approach in enhancing emotional understanding with broad applications in fields like mental health assessment, human–robot interaction, and cross-cultural communication.

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/15/6631/pdf

Reference43 articles.

1. Facial emotion recognition using deep learning: Review and insights;Mellouk;Procedia Comput. Sci.,2020

2. Faria, D.R., Vieria, M., Faria, F.C.C., and Premebida, C. (2017, January 28–31). Affective Facial Expressions Recognition for Human-Robot Interaction. Proceedings of the IEEE RO-MAN’17: IEEE International Symposium on Robot and Human Interactive Communication, Lisbon, Portugal.

3. Golzadeh, H., Faria, D.R., Manso, L., Ekart, A., and Buckingham, C. (2018, January 25–27). Emotion Recognition using Spatiotemporal Features from Facial Expression Landmarks. Proceedings of the 9th IEEE International Conference on Intelligent Systems, Madeira, Portugal.

4. Faria, D.R., Vieira, M., and Faria, F.C.C. (2017, January 21–23). Towards the Development of Affective Facial Expression Recognition for Human-Robot Interaction. Proceedings of the ACM PETRA’17: 10th International Conference on Pervasive Technologies Related to Assistive Environments, Island of Rhodes, Greece.

5. Bird, J.J., Ekart, A., Buckingham, C.D., and Faria, D.R. (2019, January 29–30). Mental Emotional Sentiment Classification with an EEG-based Brain-Machine Interface. Proceedings of the International Conference on Digital Image & Signal Processing (DISP’19), Oxford, UK.