Author:
Zhang Hua,Gou Ruoyun,Shang Jili,Shen Fangyao,Wu Yifan,Dai Guojun
Abstract
Speech emotion recognition (SER) is a difficult and challenging task because of the affective variances between different speakers. The performances of SER are extremely reliant on the extracted features from speech signals. To establish an effective features extracting and classification model is still a challenging task. In this paper, we propose a new method for SER based on Deep Convolution Neural Network (DCNN) and Bidirectional Long Short-Term Memory with Attention (BLSTMwA) model (DCNN-BLSTMwA). We first preprocess the speech samples by data enhancement and datasets balancing. Secondly, we extract three-channel of log Mel-spectrograms (static, delta, and delta-delta) as DCNN input. Then the DCNN model pre-trained on ImageNet dataset is applied to generate the segment-level features. We stack these features of a sentence into utterance-level features. Next, we adopt BLSTM to learn the high-level emotional features for temporal summarization, followed by an attention layer which can focus on emotionally relevant features. Finally, the learned high-level emotional features are fed into the Deep Neural Network (DNN) to predict the final emotion. Experiments on EMO-DB and IEMOCAP database obtain the unweighted average recall (UAR) of 87.86 and 68.50%, respectively, which are better than most popular SER methods and demonstrate the effectiveness of our propose method.
Subject
Physiology (medical),Physiology
Reference33 articles.
1. Convolutional neural networks for speech recognition;Abdel-Hamid;IEEE/ACM Trans. Audio Speech Lang. Process,2014
2. Survey on speech emotion recognition: features, classification schemes, and databases;Ayadi;Pattern Recogn,2011
3. “Speech emotion recognition from spectrograms with deep convolutional neural network,”;Badshah,2017
4. “Speaker dependent, speaker independent and cross language emotion recognition from speech using GMM and HMM,”;Bhaykar,2013
5. “A database of German emotional speech,”;Burkhardt,2005
Cited by
21 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献