A Comparison of Machine Learning Algorithms and Feature Sets for Automatic Vocal Emotion Recognition in Speech-Reference-Cited by-同舟云学术

A Comparison of Machine Learning Algorithms and Feature Sets for Automatic Vocal Emotion Recognition in Speech

Published:2022-10-06 Issue:19 Volume:22 Page:7561
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Doğdu Cem,Kessler Thomas,Schneider Dana,Shadaydeh Maha^ORCID,Schweinberger Stefan R.^ORCID

Abstract

Vocal emotion recognition (VER) in natural speech, often referred to as speech emotion recognition (SER), remains challenging for both humans and computers. Applied fields including clinical diagnosis and intervention, social interaction research or Human Computer Interaction (HCI) increasingly benefit from efficient VER algorithms. Several feature sets were used with machine-learning (ML) algorithms for discrete emotion classification. However, there is no consensus for which low-level-descriptors and classifiers are optimal. Therefore, we aimed to compare the performance of machine-learning algorithms with several different feature sets. Concretely, seven ML algorithms were compared on the Berlin Database of Emotional Speech: Multilayer Perceptron Neural Network (MLP), J48 Decision Tree (DT), Support Vector Machine with Sequential Minimal Optimization (SMO), Random Forest (RF), k-Nearest Neighbor (KNN), Simple Logistic Regression (LOG) and Multinomial Logistic Regression (MLR) with 10-fold cross validation using four openSMILE feature sets (i.e., IS-09, emobase, GeMAPS and eGeMAPS). Results indicated that SMO, MLP and LOG show better performance (reaching to 87.85%, 84.00% and 83.74% accuracies, respectively) compared to RF, DT, MLR and KNN (with minimum 73.46%, 53.08%, 70.65% and 58.69% accuracies, respectively). Overall, the emobase feature set performed best. We discuss the implications of these findings for applications in diagnosis, intervention or HCI.

Funder

Carl Zeiss Foundation

Competence Center for Interdisciplinary Prevention at Friedrich Schiller University

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Link

https://www.mdpi.com/1424-8220/22/19/7561/pdf

Reference53 articles.

1. Speech emotion recognition

2. Towards the automatic detection of social biomarkers in autism spectrum disorder: introducing the simulated interaction task (SIT)

3. Sensor-Based Technology for Social Information Processing in Autism: A Review

4. A review of depression and suicide risk assessment using speech analysis

5. A hierarchical depression detection model based on vocal and emotional cues

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Speech-based recognition and estimating severity of PTSD using machine learning;Journal of Affective Disorders;2024-10

2. Emotion Recognition on Speech Attributes Using Machine Learning;2024 IEEE International Conference on Information Technology, Electronics and Intelligent Communication Systems (ICITEICS);2024-06-28

3. Assessment of Pepper Robot’s Speech Recognition System through the Lens of Machine Learning;Biomimetics;2024-06-27

4. Assessing the effectiveness of ensembles in Speech Emotion Recognition: Performance analysis under challenging scenarios;Expert Systems with Applications;2024-06

5. Unveiling hidden factors: explainable AI for feature boosting in speech emotion recognition;Applied Intelligence;2024-05-31