Real-Time Speech Emotion Recognition Using Deep Learning and Data Augmentation-Reference-Cited by-同舟云学术

Real-Time Speech Emotion Recognition Using Deep Learning and Data Augmentation

Published:2023-05-03 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Barhoumi Chawki¹,Ayed Yassine Ben²

Affiliation:

1. National School of Electronics and Telecommunications of Sfax

2. Multimedia, InfoRmation systems and Advanced Computing Laboratory

Abstract

Abstract In human-human interactions, detecting emotions is often easy as it can be perceived through facial expressions, body gestures, or speech. However, in human-machine interactions, detecting human emotion can be a challenge. To improve this interaction, the term 'speech emotion recognition' has emerged, with the goal of recognizing emotions solely through vocal intonation. In this work, we propose a speech emotion recognition system based on deep learning approaches and two efficient data augmentation techniques (noise addition and spectrogram shifting). To evaluate the proposed system, we used three different datasets: TESS, EmoDB, and RAVDESS. We employe several algorithms such as Mel Frequency Cepstral Coefficients (MFCC), Zero Crossing Rate (ZCR), Mel spectrograms, Root Mean Square Value (RMS), and chroma to select the most appropriate vocal features that represent speech emotions. To develop our speech emotion recognition system, we use three different deep learning models, including MultiLayer Perceptron (MLP), Convolutional Neural Network (CNN), and a hybrid model that combines CNN with Bidirectional Long-Short Term Memory (Bi-LSTM). By exploring these different approaches, we were able to identify the most effective model for accurately identifying emotional states from speech signals in real-time situation. Overall, our work demonstrates the effectiveness of the proposed deep learning model, specifically based on CNN+BiLSTM, and the used two data augmentation techniques for the proposed real-time speech emotion recognition.

Publisher

Research Square Platform LLC

Reference55 articles.

1. Abbaschian, Babak Joze and Sierra-Sosa, Daniel and Elmaghraby, Adel (2021) Deep learning techniques for speech emotion recognition, from databases to models. Sensors 21(4): 1249 MDPI

2. Oh, Kyo-Joong and Lee, Dongkun and Ko, Byungsoo and Choi, Ho-Jin (2017) A chatbot for psychiatric counseling in mental healthcare service based on emotional dialogue analysis and sentence generation. IEEE, 371--375, 2017 18th IEEE international conference on mobile data management (MDM)

3. Yenigalla, Promod and Kumar, Abhay and Tripathi, Suraj and Singh, Chirag and Kar, Sibsambhu and Vepa, Jithendra (2018) Speech Emotion Recognition Using Spectrogram & Phoneme Embedding.. 3688--3692, 2018, Interspeech

4. Arguel, Ama{\"e}l and Lockyer, Lori and Kennedy, Gregor and Lodge, Jason M and Pachman, Mariya (2019) Seeking optimal confusion: a review on epistemic emotion management in interactive digital learning environments. Interactive Learning Environments 27(2): 200--210 Taylor & Francis

5. Khalil, Ruhul Amin and Jones, Edward and Babar, Mohammad Inayatullah and Jan, Tariqullah and Zafar, Mohammad Haseeb and Alhussain, Thamer (2019) Speech emotion recognition using deep learning techniques: A review. IEEE Access 7: 117327--117345 IEEE

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine Learning Approach for Detection of Speech Emotions for RAVDESS Audio Dataset;2024 Fourth International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT);2024-01-11

2. Deep Learning Algorithms for Speech Emotion Recognition with Hybrid Spectral Features;SN Computer Science;2023-11-16