FSKT‐GE: Feature maps similarity knowledge transfer for low‐resolution gaze estimation-Reference-Cited by-同舟云学术

FSKT‐GE: Feature maps similarity knowledge transfer for low‐resolution gaze estimation

Published:2024-02-18 Issue:6 Volume:18 Page:1642-1654
ISSN:1751-9659
Container-title:IET Image Processing
language:en
Short-container-title:IET Image Processing

Author:

Yan Chao¹²^ORCID,Pan Weiguo¹²^ORCID,Dai Songyin¹²,Xu Bingxin¹²,Xu Cheng¹²^ORCID,Liu Hongzhe¹²,Li Xuewei¹²

Affiliation:

1. Beijing Key Laboratory of Information Service Engineering Beijing Union University Beijing China

2. Institute for Brain and Cognitive Sciences, College of Robotics Beijing Union University Beijing China

Abstract

AbstractThe limited of texture details information in low‐resolution facial or eye images presents a challenge for gaze estimation. To address this, FSKT‐GE (feature maps similarity knowledge transfer for low‐resolution gaze estimation) is proposed, a gaze estimation framework consisting of both a high resolution (HR) network and low resolution (LR) network with the identical structure. Rather than mere feature imitation, this issue is addressed by assessing the cosine similarity of feature layers, emphasizing the distribution similarity between the HR and LR networks. This enables the LR network to acquire richer knowledge. This framework utilizes a combination loss function, incorporating cosine similarity measurement, soft loss based on probability distribution difference and gaze direction output, along with a hard loss from the LR network output layer. This approach on low‐resolution datasets derived from Gaze360 and RT‐Gene datasets is validated, demonstrating excellent performance in low‐resolution gaze estimation. Evaluations on low‐resolution images obtained through 2×, 4×, and 8× down‐sampling are conducted on two datasets. On the Gaze360 dataset, the lowest mean angular errors of 10.97°, 11.22°, and 13.61° were achieved, while on the RT‐Gene dataset, the lowest mean angular errors of 6.73°, 6.83°, and 7.75° were obtained.

Funder

Natural Science Foundation of Beijing Municipality

National Natural Science Foundation of China

Publisher

Institution of Engineering and Technology (IET)

Reference44 articles.

1. Robot System Assistant (RoSA): Towards Intuitive Multi-Modal and Multi-Device Human-Robot Interaction

2. A Human-Robot Interaction System Calculating Visual Focus of Human’s Attention Level

3. Xu Y. Dong Y. Wu J. Sun Z. Shi Z. Yu J. Gao S.:Gaze prediction in dynamic 360° immersive videos. In:2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp.5333–5342.IEEE Salt Lake City UT USA(2018)

4. Towards foveated rendering for gaze-tracked virtual reality

5. Estimation of Driver’s Gaze Region From Head Position and Orientation Using Probabilistic Confidence Regions