CLSTM: Deep Feature-Based Speech Emotion Recognition Using the Hierarchical ConvLSTM Network-Reference-Cited by-同舟云学术

CLSTM: Deep Feature-Based Speech Emotion Recognition Using the Hierarchical ConvLSTM Network

Published:2020-11-30 Issue:12 Volume:8 Page:2133
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Mustaqeem ^ORCID,Kwon Soonil^ORCID

Abstract

Artificial intelligence, deep learning, and machine learning are dominant sources to use in order to make a system smarter. Nowadays, the smart speech emotion recognition (SER) system is a basic necessity and an emerging research area of digital audio signal processing. However, SER plays an important role with many applications that are related to human–computer interactions (HCI). The existing state-of-the-art SER system has a quite low prediction performance, which needs improvement in order to make it feasible for the real-time commercial applications. The key reason for the low accuracy and the poor prediction rate is the scarceness of the data and a model configuration, which is the most challenging task to build a robust machine learning technique. In this paper, we addressed the limitations of the existing SER systems and proposed a unique artificial intelligence (AI) based system structure for the SER that utilizes the hierarchical blocks of the convolutional long short-term memory (ConvLSTM) with sequence learning. We designed four blocks of ConvLSTM, which is called the local features learning block (LFLB), in order to extract the local emotional features in a hierarchical correlation. The ConvLSTM layers are adopted for input-to-state and state-to-state transition in order to extract the spatial cues by utilizing the convolution operations. We placed four LFLBs in order to extract the spatiotemporal cues in the hierarchical correlational form speech signals using the residual learning strategy. Furthermore, we utilized a novel sequence learning strategy in order to extract the global information and adaptively adjust the relevant global feature weights according to the correlation of the input features. Finally, we used the center loss function with the softmax loss in order to produce the probability of the classes. The center loss increases the final classification results and ensures an accurate prediction as well as shows a conspicuous role in the whole proposed SER scheme. We tested the proposed system over two standard, interactive emotional dyadic motion capture (IEMOCAP) and ryerson audio visual database of emotional speech and song (RAVDESS) speech corpora, and obtained a 75% and an 80% recognition rate, respectively.

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/8/12/2133/pdf

Reference65 articles.

1. Towards Repayment Prediction in Peer-to-Peer Social Lending Using Deep Learning

2. Clustering-Based Speech Emotion Recognition by Incorporating Learned Features and Deep BiLSTM;Sajjad;IEEE Access,2020

3. Evaluating the Suitability of a Smart Technology Application for Fall Detection Using a Fuzzy Collaborative Intelligence Approach

4. A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition;Kwon;Sensors,2020

5. Differential Evolution for Neural Networks Optimization

Cited by 94 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Analyzing the influence of different speech data corpora and speech features on speech emotion recognition: A review;Speech Communication;2024-07

2. CST-UNet: Cross Swin Transformer Enhanced U-Net with Masked Bottleneck for Single-Channel Speech Enhancement;Circuits, Systems, and Signal Processing;2024-06-16

3. Novel sound event and sound activity detection framework based on intrinsic mode functions and deep learning;Multimedia Tools and Applications;2024-06-11

4. Enhancing speech emotion recognition through deep learning and handcrafted feature fusion;Applied Acoustics;2024-06

5. Assessing the effectiveness of ensembles in Speech Emotion Recognition: Performance analysis under challenging scenarios;Expert Systems with Applications;2024-06