Abstract
Speech emotion recognition is an emerging research field in the 21st century, which is of great significance to human–computer interaction. In order to enable various smart devices to better recognize and understand the emotions contained in human speech, in view of the problems of gradient disappearance and poor learning ability of the time series information in the current speech emotion classification model, an AA-CBGRU network model is proposed for speech emotion recognition. The model first extracts the spectrogram and its first and second order derivative features of the speech signal, then extracts the spatial features of the inputs through the convolutional neural network with residual blocks, then uses the BGRU network with an attention layer to mine deep time series information, and finally uses the full connection layer to achieve the final emotion recognition. The experimental results on the IEMOCAP sentiment corpus show that the model in this paper improves both the weighted accuracy (WA) and the unweighted accuracy (UA).
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献