Fine-grained gaze estimation based on the combination of regression and classification losses-Reference-Cited by-同舟云学术

Fine-grained gaze estimation based on the combination of regression and classification losses

Published:2024-09-03 Issue: Volume: Page:
ISSN:0924-669X
Container-title:Applied Intelligence
language:en
Short-container-title:Appl Intell

Author:

Abdelrahman Ahmed A.,Hempel Thorsten,Khalifa Aly,Al-Hamadi Ayoub

Abstract

AbstractHuman gaze is a crucial cue used in various applications such as human-robot interaction, autonomous driving, and virtual reality. Recently, convolution neural network (CNN) approaches have made notable progress in predicting gaze angels. However, estimating accurate gaze direction in-the-wild is still a challenging problem due to the difficulty of obtaining the most crucial gaze information that exists in the eye area which constitutes a small part of the face images. In this paper, we introduce a novel two-branch CNN architecture with a multi-loss approach to estimate gaze angles (pitch and yaw) from face images. Our approach utilizes separate fully connected layers for each gaze angle prediction, allowing explicit learning of discriminative features and emphasizing the distinct information associated with each gaze angle. Moreover, we adopt a multi-loss approach, incorporating both classification and regression losses. This allows for joint optimization of the combined loss for each gaze angle, resulting in improved overall gaze performance. To evaluate our model, we conduct experiments on three popular datasets collected under unconstrained settings: MPIIFaceGaze, Gaze360, and RT-GENE. Our proposed model surpasses current state-of-the-art methods and achieves state-of-the-art performance on all three datasets, showcasing its superior capability in gaze estimation.

Funder

Bundesministerium für Bildung und Forschung

Deutsche Forschungsgemeinsch

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10489-024-05778-3.pdf

Reference62 articles.

1. Hempel T, Al-Hamadi A (2020) Slam-based multistate tracking system for mobile human-robot interaction. In: International conference on image analysis and recognition, pp 368–376. Springer

2. Strazdas D, Hintz J, Khalifa A, Abdelrahman AA, Hempel T, Al-Hamadi A (2022) Robot system assistant (rosa): Towards intuitive multi-modal and multi-device human-robot interaction. Sensors 22(3):923

3. Abdelrahman AA, Strazdas D, Khalifa A, Hintz J, Hempel T, Al-Hamadi A (2022) Multi-modal engagement prediction in multi-person human-robot interaction. IEEE Access

4. Hu Z, Lv C, Hang P, Huang C, Xing Y (2021) Data-driven estimation of driver attention using calibration-free eye gaze and scene features. IEEE Trans Ind Electron 69(2):1800–1808

5. Vora S, Rangesh A, Trivedi MM (2018) Driver gaze zone estimation using convolutional neural networks: A general framework and ablative analysis. IEEE Trans Intell Veh 3(3):254–265