Author:
Kumar Sandeep,Yadav Jainath
Abstract
Abstract
Long Short-Term Memory (LSTM) captures long-term dependencies accurately than other types of neural networks, and it is frequently used in deep learning. In this work, we have explored Deep LSTM with a dropout layer that minimizes the training overfitting. We have considered IITKGP-SEHSC emotional dataset for emotion recognition. We only deal with five types of emotions, namely angry, fear, happy, neutral, and sad emotions recorded from male and female speech. Since the IITKGP-SEHSC dataset is monolingual that means only spectral features are sufficient for emotion recognition. Traditional MFCC deals with low-frequency information. Here, we have explored two features, namely Gammatone Mel Frequency Cepstral Coefficient (GMFCC) and Discrete wavelet Mel Frequency Cepstral Coefficient (DMFCC). GMFCC deals with basilar membrane displacement obtained from the gammatone filter, and it is useful for recognizing gender from emotional speech. DMFCC deals with MFCC analysis on the high-frequency components of speech rather than the low-frequency components. In the proposed work, DMFCC has been explored for recognizing emotions from speech. The average accuracy of gender classification with Deep LSTM and GMFCC is 98.3%. The average emotion recognition rate with Deep LSTM and DMFCC is 92% and 88.7% individually for male speech and female speech, respectively. Our proposed model is built by combining the above sub-models, and it gives emotion recognition accuracy of 91.2% for male speech and 87.6% for female speech, respectively.
Subject
General Physics and Astronomy
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献