Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition-Reference-Cited by-同舟云学术

Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition

Published:2020 Issue: Volume: Page:283-293
ISSN:
Container-title:Cognitive Analytics
language:
Short-container-title:

Author:

Trabelsi Imen¹,Bouhlel Med Salim¹

Affiliation:

1. Sciences and Technologies of Image and Telecommunications (SETIT), University of Sfax, Tunisia

Abstract

Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.

Publisher

IGI Global

Reference34 articles.

1. A Database of German Emotional Speech. (2005).

2. Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011

3. Chang, C-C., & Lin, CJ. (2011). LIBSVM: A library for support vector machines.ACM Transactions on Intelligent Systems and Technology, 2(3), 27.

4. Emotional speech recognition based on SVM with GMM supervector