1. Akçay, M.B., Oğuz, K.: Speech emotion recognition: emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers. Speech Commun. 116, 56–76 (2020)
2. Bhatti, M.W., Wang, Y., Guan, L.: A neural network approach for human emotion recognition in speech. In: 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No. 04CH37512), vol. 2, pp. II–181. IEEE (2004)
3. Hao, M., Cao, W.-H., Liu, Z.-T., Min, W., Xiao, P.: Visual-audio emotion recognition based on multi-task and ensemble learning with multiple features. Neurocomputing 391, 42–51 (2020)
4. Kanluan, I., Grimm, M., Kroschel, K.: Audio-visual emotion recognition using an emotion space concept. In: 2008 16th European Signal Processing Conference, pp. 1–5. IEEE (2008)
5. Zhang, S., Zhang, S., Huang, T., Gao, W., Tian, Q.: Learning affective features with a hybrid deep model for audio–visual emotion recognition. IEEE Trans. Circuits Syst. Video Technol. 28(10), 3030–3043 (2017)