Abstract
In online mode of teaching—learning process, which is prevailing in large in the recent years, the levels of involvement while taking part in these instructional activities—ranges from boredom to learning gain. It is a significant role of online educators to accurately and effectively determine their online learners’ engagement status in order to offer them individualized pedagogical support through interventions. This work thus focuses on processing the online video classes to analyze emotional engagement of the learners. On the basis of video face processing, an innovative pipeline is suggested. First, the face in the facial video of the dataset is detected using a multitask cascaded convolutional neural networks (MTCNN) framework designed for face detection. Then, using a single efficient convolutional neural network (CNN), the emotional features of each frame are obtained, and the appropriate emotions are forecasted. The engagement level is then determined using the weighted average of the assessed probabilities of the predicted emotions. This network is pretrained on face detection and fine‐tuned for identifying emotions on static images using a newly designed robust optimization technique. The three levels of student engagement—highly engaged, engaged, and disengaged—and their seven different emotions, happy, sad, angry, neutral, scared, surprise, and disgust, are all been quickly and simultaneously predicted using the generated facial features. The students’ facial recordings may all be processed secretly and instantly on their mobile device with this technology, so there is no need to transfer them elsewhere. The proposed model detects emotions and levels of engagement with a better accuracy of 97.45%.