Affiliation:
1. Veltech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi, Chennai, India
2. ECE, Veltech Rangarajan Dr. Sagunthala R&D Institute of Science and Technology, Avadi, Chennai, India
Abstract
Emotion recognition from speech signals serves a crucial role in human-computer interaction and behavioral studies. The task, however, presents significant challenges due to the high dimensionality and noisy nature of speech data. This article presents a comprehensive study and analysis of a novel approach, “Digital Features Optimization by Diversity Measure Fusion (DFOFDM)”, aimed at addressing these challenges. The paper begins by elucidating the necessity for improved emotion recognition methods, followed by a detailed introduction to DFOFDM. This approach employs acoustic and spectral features from speech signals, coupled with an optimized feature selection process using a fusion of diversity measures. The study’s central method involves a Cuckoo Search-based classification strategy, which is tailored for this multi-label problem. The performance of the proposed DFOFDM approach is evaluated extensively. Emotion labels such as ‘Angry’, ‘Happy’, and ‘Neutral’ showed a precision rate over 92%, while other emotions fell within the range of 87% to 90%. Similar performance was observed in terms of recall, with most emotions falling within the 90% to 95% range. The F-Score, another crucial metric, also reflected comparable statistics for each label. Notably, the DFOFDM model showed resilience to label imbalances and noise in speech data, crucial for real-world applications. When compared with a contemporary model, “Transfer Subspace Learning by Least Square Loss (TSLSL)”, DFOFDM displayed superior results across various evaluation metrics, indicating a promising improvement in the field of speech emotion recognition. In terms of computational complexity, DFOFDM demonstrated effective scalability, providing a feasible solution for large-scale applications. Despite its effectiveness, the study acknowledges the potential limitations of the DFOFDM, which might influence its performance on certain types of real-world data. The findings underline the potential of DFOFDM in advancing emotion recognition techniques, indicating the necessity for further research.
Subject
Artificial Intelligence,General Engineering,Statistics and Probability
Reference36 articles.
1. , Speech emotion recognition using deep neural network considering verbal and nonverbal speech sounds;Huang;ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),2019
2. , Robust emotion recognition by spectro-temporal modulation statistic features;Chi;Journal of Ambient Intelligence and Humanized Computing,2012
3. S. R., Acoustical properties of speech as indicators of depression and suicidal risk;France;IEEE transactions on Biomedical Engineering,2000
4. C. D., Icarus: Source generator based real-time recognition of speech in noisy stressful and lombard effect environments;Hansen;Speech communication,1995
5. Feature fusion methods research based on deep belief networks for speech emotion recognition under noise condition;Huang;Journal of Ambient Intelligence and Humanized Computing,2019