Gaze Estimation via Strip Pooling and Multi-Criss-Cross Attention Networks-Reference-Cited by-同舟云学术

Gaze Estimation via Strip Pooling and Multi-Criss-Cross Attention Networks

Published:2023-05-10 Issue:10 Volume:13 Page:5901
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Yan Chao¹²,Pan Weiguo¹²^ORCID,Xu Cheng¹²^ORCID,Dai Songyin¹²,Li Xuewei¹²

Affiliation:

1. Beijing Key Laboratory of Information Service Engineering, Beijing Union University, Beijing 100101, China

2. Institute for Brain and Cognitive Sciences, College of Robotics, Beijing Union University, Beijing 100101, China

Abstract

Deep learning techniques for gaze estimation usually determine gaze direction directly from images of the face. These algorithms achieve good performance because face images contain more feature information than eye images. However, these image classes contain a substantial amount of redundant information that may interfere with gaze prediction and may represent a bottleneck for performance improvement. To address these issues, we model long-distance dependencies between the eyes via Strip Pooling and Multi-Criss-Cross Attention Networks (SPMCCA-Net), which consist of two newly designed network modules. One module is represented by a feature enhancement bottleneck block based on fringe pooling. By incorporating strip pooling, this residual module not only enlarges its receptive fields to capture long-distance dependence between the eyes but also increases weights on important features and reduces the interference of redundant information unrelated to gaze. The other module is a multi-criss-cross attention network. This module exploits a cross-attention mechanism to further enhance long-range dependence between the eyes by incorporating the distribution of eye-gaze features and providing more gaze cues for improving estimation accuracy. Network training relies on the multi-loss function, combined with smooth L1 loss and cross entropy loss. This approach speeds up training convergence while increasing gaze estimation precision. Extensive experiments demonstrate that SPMCCA-Net outperforms several state-of-the-art methods, achieving mean angular error values of 10.13° on the Gaze360 dataset and 6.61° on the RT-gene dataset.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/13/10/5901/pdf

Reference35 articles.

1. Research on fatigue detection method based on multi-scale pooled convolutional neural network;Gu;Comput. Appl. Res.,2019

2. Ghosh, S., Dhall, A., Hayat, M., Knibbe, J., and Ji, Q. (2021). Automatic gaze analysis: A survey of deep learning based approaches. arXiv.

3. Progress and prospects of eye-tracking research;Gou;J. Autom.,2021

4. Cheng, Y., Wang, H., Bao, Y., and Lu, F. (2021). Appearance-based gaze estimation with deep learning: A review and benchmark. arXiv.

5. Hou, Q., Zhang, L., Cheng, M.M., and Feng, J. (2020, January 13–19). Strip pooling: Rethinking spatial pooling for scene parsing. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, DC, USA.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Joint pyramidal perceptual attention and hierarchical consistency constraint for gaze estimation;Computer Vision and Image Understanding;2024-11

2. End-to-End Video Gaze Estimation via Capturing Head-Face-Eye Spatial-Temporal Interaction Context;IEEE Signal Processing Letters;2023