Affiliation:
1. SRM Institute of Science and Technology, India
Abstract
Deep learning (DL) as a part of artificial intelligence (AI) uses neural networks to create models and perform tasks that would otherwise require human intelligence. DL techniques are used to predict human emotions from audio signals in the speech emotion recognition (SpEmRe) process. SpEmRe is a system capable of identifying different types of emotion in audio samples. For the SpEmRe process to be successful, bi-directional long short-term memory (Bi-LSTM) model, addressing the growing interest in understanding and interpreting human emotions through speech. The objective was to develop a deep learning model capable of accurately identifying and classifying emotional states from speech data. The training phase achieved a high accuracy, indicating the model proficiency in learning from the dataset. A SpEmRe model is built using hybrid features fed to a convolutional neural network (CNN). Detailed experiments are conducted to verify the performance of an improved convolution neural network - bi-directional long short-term memory algorithm (CNN-BiLSTM) with other ML algorithms.