Author:
Abdelrahman Ahmed A.,Hempel Thorsten,Khalifa Aly,Al-Hamadi Ayoub
Abstract
AbstractHuman gaze is a crucial cue used in various applications such as human-robot interaction, autonomous driving, and virtual reality. Recently, convolution neural network (CNN) approaches have made notable progress in predicting gaze angels. However, estimating accurate gaze direction in-the-wild is still a challenging problem due to the difficulty of obtaining the most crucial gaze information that exists in the eye area which constitutes a small part of the face images. In this paper, we introduce a novel two-branch CNN architecture with a multi-loss approach to estimate gaze angles (pitch and yaw) from face images. Our approach utilizes separate fully connected layers for each gaze angle prediction, allowing explicit learning of discriminative features and emphasizing the distinct information associated with each gaze angle. Moreover, we adopt a multi-loss approach, incorporating both classification and regression losses. This allows for joint optimization of the combined loss for each gaze angle, resulting in improved overall gaze performance. To evaluate our model, we conduct experiments on three popular datasets collected under unconstrained settings: MPIIFaceGaze, Gaze360, and RT-GENE. Our proposed model surpasses current state-of-the-art methods and achieves state-of-the-art performance on all three datasets, showcasing its superior capability in gaze estimation.
Funder
Bundesministerium für Bildung und Forschung
Deutsche Forschungsgemeinsch
Publisher
Springer Science and Business Media LLC