Predicting the Arousal and Valence Values of Emotional States Using Learned, Predesigned, and Deep Visual Features
Author:
Joudeh Itaf Omar1ORCID, Cretu Ana-Maria1ORCID, Bouchard Stéphane2ORCID
Affiliation:
1. Department of Computer Science and Engineering, University of Quebec in Outaouais, Gatineau, QC J8Y 3G5, Canada 2. Department of Psychoeducation and Psychology, University of Quebec in Outaouais, Gatineau, QC J8X 3X7, Canada
Abstract
The cognitive state of a person can be categorized using the circumplex model of emotional states, a continuous model of two dimensions: arousal and valence. The purpose of this research is to select a machine learning model(s) to be integrated into a virtual reality (VR) system that runs cognitive remediation exercises for people with mental health disorders. As such, the prediction of emotional states is essential to customize treatments for those individuals. We exploit the Remote Collaborative and Affective Interactions (RECOLA) database to predict arousal and valence values using machine learning techniques. RECOLA includes audio, video, and physiological recordings of interactions between human participants. To allow learners to focus on the most relevant data, features are extracted from raw data. Such features can be predesigned, learned, or extracted implicitly using deep learners. Our previous work on video recordings focused on predesigned and learned visual features. In this paper, we extend our work onto deep visual features. Our deep visual features are extracted using the MobileNet-v2 convolutional neural network (CNN) that we previously trained on RECOLA’s video frames of full/half faces. As the final purpose of our work is to integrate our solution into a practical VR application using head-mounted displays, we experimented with half faces as a proof of concept. The extracted deep features were then used to predict arousal and valence values via optimizable ensemble regression. We also fused the extracted visual features with the predesigned visual features and predicted arousal and valence values using the combined feature set. In an attempt to enhance our prediction performance, we further fused the predictions of the optimizable ensemble model with the predictions of the MobileNet-v2 model. After decision fusion, we achieved a root mean squared error (RMSE) of 0.1140, a Pearson’s correlation coefficient (PCC) of 0.8000, and a concordance correlation coefficient (CCC) of 0.7868 on arousal predictions. We achieved an RMSE of 0.0790, a PCC of 0.7904, and a CCC of 0.7645 on valence predictions.
Funder
Natural Sciences and Engineering Research Council of Canada (NSERC)’s Discovery
Reference23 articles.
1. Russell, J. (1979). Affective Space Is Bipolar, American Psychological Association. 2. Ringeval, F., Sonderegger, A., Sauer, J., and Lalanne, D. (2013, January 22–26). Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions. Proceedings of the 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), Shanghai, China. 3. Joudeh, I.O., Cretu, A., Guimond, S., and Bouchard, S. (2022). Prediction of Emotional Measures via Electrodermal Activity (EDA) and Electrocardiogram (ECG). Eng. Proc., 27. 4. Joudeh, I.O., Cretu, A.-M., Bouchard, S., and Guimond, S. (2023). Prediction of Continuous Emotional Measures through Physiological and Visual Data. Sensors, 23. 5. Joudeh, I.O., Cretu, A.-M., Bouchard, S., and Guimond, S. (2023, January 11–13). Prediction of Emotional States from Partial Facial Features for Virtual Reality Applications. Proceedings of the 26th Annual CyberPsychology, CyberTherapy and Social Networking Conference (CYPSY26), Paris, France.
|
|