Author:
Zhang Shiqing,Liu Ruixin,Tao Xin,Zhao Xiaoming
Abstract
Automatic speech emotion recognition (SER) is a challenging component of human-computer interaction (HCI). Existing literatures mainly focus on evaluating the SER performance by means of training and testing on a single corpus with a single language setting. However, in many practical applications, there are great differences between the training corpus and testing corpus. Due to the diversity of different speech emotional corpus or languages, most previous SER methods do not perform well when applied in real-world cross-corpus or cross-language scenarios. Inspired by the powerful feature learning ability of recently-emerged deep learning techniques, various advanced deep learning models have increasingly been adopted for cross-corpus SER. This paper aims to provide an up-to-date and comprehensive survey of cross-corpus SER, especially for various deep learning techniques associated with supervised, unsupervised and semi-supervised learning in this area. In addition, this paper also highlights different challenges and opportunities on cross-corpus SER tasks, and points out its future trends.
Funder
National Natural Science Foundation of China
Subject
Artificial Intelligence,Biomedical Engineering
Reference129 articles.
1. Domain adversarial for acoustic emotion recognition;Abdelwahab;IEEE/ACM Trans. Audio Speech Lang. Process.,2018
2. Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers;Akçay;Speech Commun.,2020
3. Graph based semi-supervised learning with convolution neural networks to classify crisis related tweets,;Alam,2018
4. Spoken emotion recognition using hierarchical classifiers;Albornoz;Comput. Speech Lang.,2011
5. A systematic review on supervised and unsupervised machine learning algorithms for data science,;Alloghani;Supervised unsupervised Learn Data Sci.,2020
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献