Affiliation:
1. Department of Computer Engineering, Faculty of Engineering and Technology, Arab Academy for Science, Technology and Maritime Transport (AAST), Alexandria 1029, Egypt
Abstract
Emotion recognition is crucial in artificial intelligence, particularly in the domain of human–computer interaction. The ability to accurately discern and interpret emotions plays a critical role in helping machines to effectively decipher users’ underlying intentions, allowing for a more streamlined interaction process that invariably translates into an elevated user experience. The recent increase in social media usage, as well as the availability of an immense amount of unstructured data, has resulted in a significant demand for the deployment of automated emotion recognition systems. Artificial intelligence (AI) techniques have emerged as a powerful solution to this pressing concern in this context. In particular, the incorporation of multimodal AI-driven approaches for emotion recognition has proven beneficial in capturing the intricate interplay of diverse human expression cues that manifest across multiple modalities. The current study aims to develop an effective multimodal emotion recognition system known as MM-EMOR in order to improve the efficacy of emotion recognition efforts focused on audio and text modalities. The use of Mel spectrogram features, Chromagram features, and the Mobilenet Convolutional Neural Network (CNN) for processing audio data are central to the operation of this system, while an attention-based Roberta model caters to the text modality. The methodology of this study is based on an exhaustive evaluation of this approach across three different datasets. Notably, the empirical findings show that MM-EMOR outperforms competing models across the same datasets. This performance boost is noticeable, with accuracy gains of an impressive 7% on one dataset and a substantial 8% on another. Most significantly, the observed increase in accuracy for the final dataset was an astounding 18%.
Subject
Artificial Intelligence,Computer Science Applications,Information Systems,Management Information Systems
Reference43 articles.
1. Li, J., Mishra, S., El-Kishky, A., Mehta, S., and Kulkarni, V. (2022). NTULM: Enriching social media text representations with non-textual units. arXiv.
2. Dynamic facial emotion recognition oriented to HCI applications;Pablos;Interact. Comput.,2015
3. Makiuchi, M.R., Uto, K., and Shinoda, K. (2021, January 13–17). Multimodal emotion recognition with high-level speech and text features. Proceedings of the 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Cartagena, Colombia.
4. Kandali, A.B., Routray, A., and Basu, T.K. (2008, January 19–21). Emotion recognition from Assamese speeches using MFCC features and GMM classifier. Proceedings of the TENCON 2008—2008 IEEE Region 10 Conference, Hyderabad, India.
5. Speech emotion recognition using hidden Markov models;Nwe;Speech Commun.,2003