Author:
Sezgin Mehmet Cenk,Gunsel Bilge,Kurt Gunes Karabulut
Abstract
Abstract
In this article, we propose a new set of acoustic features for automatic emotion recognition from audio. The features are based on the perceptual quality metrics that are given in perceptual evaluation of audio quality known as ITU BS.1387 recommendation. Starting from the outer and middle ear models of the auditory system, we base our features on the masked perceptual loudness which defines relatively objective criteria for emotion detection. The features computed in critical bands based on the reference concept include the partial loudness of the emotional difference, emotional difference-to-perceptual mask ratio, measures of alterations of temporal envelopes, measures of harmonics of the emotional difference, the occurrence probability of emotional blocks, and perceptual bandwidth. A soft-majority voting decision rule that strengthens the conventional majority voting is proposed to assess the classifier outputs. Compared to the state-of-the-art systems including Munich Open-Source Emotion and Affect Recognition Toolkit, Hidden Markov Toolkit, and Generalized Discriminant Analysis, it is shown that the emotion recognition rates are improved between 7-16% for EMO-DB and 7-11% in VAM for "all" and "valence" tasks.
Publisher
Springer Science and Business Media LLC
Subject
Electrical and Electronic Engineering,Acoustics and Ultrasonics
Reference30 articles.
1. Cowie R, Douglas-Cowie E, Tsapatsoulis N, Votsis G, Kollias S, Fellenz W, Taylor J: Emotion recognition in human-computer interaction. IEEE Signal Process Mag 2001, 18(1):32-80.
2. Ayadia ME, Kamelb MS, Karrayb F: Survey on speech emotion recognition: features, classification schemes, and databases. Pattern Recognit 2011, 44(3):572-587.
3. Lee CM, Narayanan SS: Toward detecting emotions in spoken dialogs. IEEE Trans Speech Audio Process 2005, 13: 293-303.
4. Gunes H, Schuller B, Pantic M, Cowie R: Emotion representation, analysis and synthesis in continuous space: a survey. Proc of the IEEE Int Workshop on EmoSPACE, in Conjunction with the IEEE FG 2011, CA, USA 2011, 827-834.
5. Schuller B, Vlasenko B, Eyben F, Rigoll G, Wendemuth A: Acoustic emotion recognition: a benchmark comparison of performances. Proc of the IEEE Automatic Speech Recognition and Understanding Workshop, Italy 2009, 552-557.
Cited by
43 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Enhancing Speech Audio Emotion Recognition for Diverse Feature Analysis through MLP Classifier;2024 International Conference on Cognitive Robotics and Intelligent Systems (ICC - ROBINS);2024-04-17
2. DHERF: A Deep Learning Ensemble Feature Extraction Framework for Emotion Recognition Using Enhanced-CNN;Journal of Advances in Information Technology;2024
3. Using Convolutional Neural Networks for Music Emotion Recognition on Microcontrollers;2023 IEEE 64th International Scientific Conference on Information Technology and Management Science of Riga Technical University (ITMS);2023-10-05
4. Improved Speech Emotion Classification Using Deep Neural Network;Circuits, Systems, and Signal Processing;2023-07-29
5. Voice Data-Mining on Audio from Audio and Video Clips;Smart Innovation, Systems and Technologies;2023