Author:
Fu Hongliang,Zhuang Zhihao,Wang Yang,Huang Chen,Duan Wenzhuo
Abstract
To solve the problem of feature distribution discrepancy in cross-corpus speech emotion recognition tasks, this paper proposed an emotion recognition model based on multi-task learning and subdomain adaptation, which alleviates the impact on emotion recognition. Existing methods have shortcomings in speech feature representation and cross-corpus feature distribution alignment. The proposed model uses a deep denoising auto-encoder as a shared feature extraction network for multi-task learning, and the fully connected layer and softmax layer are added before each recognition task as task-specific layers. Subsequently, the subdomain adaptation algorithm of emotion and gender features is added to the shared network to obtain the shared emotion features and gender features of the source domain and target domain, respectively. Multi-task learning effectively enhances the representation ability of features, a subdomain adaptive algorithm promotes the migrating ability of features and effectively alleviates the impact of feature distribution differences in emotional features. The average results of six cross-corpus speech emotion recognition experiments show that, compared with other models, the weighted average recall rate is increased by 1.89~10.07%, the experimental results verify the validity of the proposed model.
Funder
National Natural Science Foundation of China
Natural Science Project of Henan Education Department
Start-up Fund for High-level Talents of Henan University of Technology
Subject
General Physics and Astronomy
Reference24 articles.
1. On the Evolution of Speech Representations for Affective Computing: A brief history and critical overview;Alisamir;IEEE Signal Process. Mag.,2021
2. Automatic speech recognition: A survey;Malik;Multimed. Tools Appl.,2021
3. Neonatal Bowel Sound Detection Using Convolutional Neural Network and Laplace Hidden Semi-Markov Model;Sitaula;IEEE/ACM Trans. Audio Speech Lang. Process.,2022
4. Burne, L., Sitaula, C., Priyadarshi, A., Tracy, M., Kavehei, O., Hinder, M., Withana, A., McEwan, A., and Marzbanrad, F. Ensemble Approach on Deep and Handcrafted Features for Neonatal Bowel Sound Detection. IEEE J. Biomed. Health Inform., 2022.
5. Lee, S. (2021, January 19–22). Domain Generalization with Triplet Network for Cross-Corpus Speech Emotion Recognition. Proceedings of the IEEE Spoken Language Technology Workshop, Shenzhen, China.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献