Affiliation:
1. School of Computing and Information Science Research, Anglia Ruskin University, Cambridge CB1 1PT, UK
Abstract
Speech emotion recognition is an important research topic that can help to maintain and improve public health and contribute towards the ongoing progress of healthcare technology. There have been several advancements in the field of speech emotion recognition systems including the use of deep learning models and new acoustic and temporal features. This paper proposes a self-attention-based deep learning model that was created by combining a two-dimensional Convolutional Neural Network (CNN) and a long short-term memory (LSTM) network. This research builds on the existing literature to identify the best-performing features for this task with extensive experiments on different combinations of spectral and rhythmic information. Mel Frequency Cepstral Coefficients (MFCCs) emerged as the best performing features for this task. The experiments were performed on a customised dataset that was developed as a combination of RAVDESS, SAVEE, and TESS datasets. Eight states of emotions (happy, sad, angry, surprise, disgust, calm, fearful, and neutral) were detected. The proposed attention-based deep learning model achieved an average test accuracy rate of 90%, which is a substantial improvement over established models. Hence, this emotion detection model has the potential to improve automated mental health monitoring.
Subject
Health, Toxicology and Mutagenesis,Public Health, Environmental and Occupational Health
Reference66 articles.
1. Communicating emotion: The role of prosodic features;Frick;Psychol. Bull.,1985
2. The role of voice quality and prosodic contour in affective speech perception;Grichkovtsova;Speech Commun.,2012
3. Speech emotion recognition using deep learning techniques: A review;Khalil;IEEE Access,2019
4. Emotion recognition from speech: A review;Koolagudi;Int. J. Speech Technol.,2012
5. Blasio, S.D., Shtrepi, L., Puglisi, G.E., and Astolfi, A. (2019). A Cross-Sectional Survey on the Impact of Irrelevant Speech Noise on Annoyance, Mental Health and Well-being, Performance and Occupants’ Behavior in Shared and Open-Plan Offices. Int. J. Environ. Res. Public Health, 16.
Cited by
21 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献