In the Eye of Transformer: Global–Local Correlation for Egocentric Gaze Estimation and Beyond

Author:

Lai BolinORCID,Liu Miao,Ryan Fiona,Rehg James M.

Abstract

AbstractPredicting human’s gaze from egocentric videos serves as a critical role for human intention understanding in daily activities. In this paper, we present the first transformer-based model to address the challenging problem of egocentric gaze estimation. We observe that the connection between the global scene context and local visual information is vital for localizing the gaze fixation from egocentric video frames. To this end, we design the transformer encoder to embed the global context as one additional visual token and further propose a novel global–local correlation module to explicitly model the correlation of the global token and each local token. We validate our model on two egocentric video datasets – EGTEA Gaze + and Ego4D. Our detailed ablation studies demonstrate the benefits of our method. In addition, our approach exceeds the previous state-of-the-art model by a large margin. We also apply our model to a novel gaze saccade/fixation prediction task and the traditional action recognition problem. The consistent gains suggest the strong generalization capability of our model. We also provide additional visualizations to support our claim that global–local correlation serves a key representation for predicting gaze fixation from egocentric videos. More details can be found in our website (https://bolinlai.github.io/GLC-EgoGazeEst).

Funder

National Institutes of Health

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. An Outlook into the Future of Egocentric Vision;International Journal of Computer Vision;2024-05-28

2. Privacy Preserving Gaze Estimation Via Federated Learning Adapted To Egocentric Video;ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP);2024-04-14

3. A Survey on Multimodal Large Language Models for Autonomous Driving;2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW);2024-01-01

4. SwinGaze: Egocentric Gaze Estimation with Video Swin Transformer;2023 IEEE 16th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC);2023-12-18

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3