Attention-LSTM-Attention Model for Speech Emotion Recognition and Analysis of IEMOCAP Database-Reference-Cited by-同舟云学术

Attention-LSTM-Attention Model for Speech Emotion Recognition and Analysis of IEMOCAP Database

Published:2020-04-26 Issue:5 Volume:9 Page:713
ISSN:2079-9292
Container-title:Electronics
language:en
Short-container-title:Electronics

Author:

Yu Yeonguk^ORCID,Kim Yoon-Joong^ORCID

Abstract

We propose a speech-emotion recognition (SER) model with an “attention-long Long Short-Term Memory (LSTM)-attention” component to combine IS09, a commonly used feature for SER, and mel spectrogram, and we analyze the reliability problem of the interactive emotional dyadic motion capture (IEMOCAP) database. The attention mechanism of the model focuses on emotion-related elements of the IS09 and mel spectrogram feature and the emotion-related duration from the time of the feature. Thus, the model extracts emotion information from a given speech signal. The proposed model for the baseline study achieved a weighted accuracy (WA) of 68% for the improvised dataset of IEMOCAP. However, the WA of the proposed model of the main study and modified models could not achieve more than 68% in the improvised dataset. This is because of the reliability limit of the IEMOCAP dataset. A more reliable dataset is required for a more accurate evaluation of the model’s performance. Therefore, in this study, we reconstructed a more reliable dataset based on the labeling results provided by IEMOCAP. The experimental results of the model for the more reliable dataset confirmed a WA of 73%.

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering

Link

https://www.mdpi.com/2079-9292/9/5/713/pdf

Reference30 articles.

1. Towards a Small Set of Robust Acoustic Features for Emotion Recognition: Challenges

2. Emotion recognition from speech: a review

3. Emotion recognition in the noise applying large acoustic feature sets;Schuller;Proc. Speech Prosody,2006

Cited by 55 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Recognition of Western Black-Crested Gibbon Call Signatures Based on SA_DenseNet-LSTM-Attention Network;Sustainability;2024-08-30

2. Squeeze-and-excitation 3D convolutional attention recurrent network for end-to-end speech emotion recognition;Applied Soft Computing;2024-08

3. A review on emotion detection by using deep learning techniques;Artificial Intelligence Review;2024-07-11

4. MSER: Multimodal speech emotion recognition using cross-attention with deep fusion;Expert Systems with Applications;2024-07

5. Designing LSTM Networks for Emotion Modelling;Advances in Psychology, Mental Health, and Behavioral Studies;2024-04-12