Feature Aggregation with Two-Layer Ensemble Framework for Multilingual Speech Emotion Recognition-Reference-Cited by-同舟云学术

Feature Aggregation with Two-Layer Ensemble Framework for Multilingual Speech Emotion Recognition

Published:2023-12-11 Issue: Volume:2023 Page:1-16
ISSN:1563-5147
Container-title:Mathematical Problems in Engineering
language:en
Short-container-title:Mathematical Problems in Engineering

Author:

Ough Sangho¹^ORCID,Pyo Sejong¹^ORCID,Kim Taeyong¹^ORCID

Affiliation:

1. The Graduate School of Advanced Imaging Science, Multimedia and Film, Chung-Ang University, Seoul 06974, Republic of Korea

Abstract

In this study, we present a framework for improving the accuracy of speech emotion recognition in a multilingual environment. In our prior experiments, where machine learning (ML) models were trained to predict emotions in Korean and then tested in English, as well as vice versa, we observed a dependency on language in emotion recognition, resulting in poor accuracy. We suspect that this may be related to the spectral differences in certain emotions between Korean and English and to the tendency for different formant values to have different acoustic frequencies. For this study, we investigated several different methods, including models with mixed databases, a single database, and bagging, boosting, and voting ML algorithms. Finally, we developed a framework consisting of two branches: one for the aggregation of high-dimensional features from multilingual data and one for a two-layered ensemble framework for emotion classification. In the ensemble framework for Korean and English (EF-KEN), features are extracted and ensemble models are trained, boosted, and evaluated by applying them to different spoken languages (English and Korean). The final experimental result demonstrates a meaningful improvement in an environment with two different languages.

Funder

Chung-Ang University

Publisher

Hindawi Limited

Subject

General Engineering,General Mathematics

Link

http://downloads.hindawi.com/journals/mpe/2023/8837465.pdf

Reference51 articles.

1. IEMOCAP: interactive emotional dyadic motion capture database

2. Emotion recognition from speech: a review

3. Survey on bimodal speech emotion recognition from acoustic and linguistic information fusion

4. A Comprehensive Review of Speech Emotion Recognition Systems

5. A systematic review on automated human emotion recognition using electroencephalogram signals and artificial intelligence