Author:
Chen Yupei,Yang Zhibo,Ahn Seoyoung,Samaras Dimitris,Hoai Minh,Zelinsky Gregory
Abstract
AbstractAttention control is a basic behavioral process that has been studied for decades. The currently best models of attention control are deep networks trained on free-viewing behavior to predict bottom-up attention control – saliency. We introduce COCO-Search18, the first dataset of laboratory-quality goal-directed behavior large enough to train deep-network models. We collected eye-movement behavior from 10 people searching for each of 18 target-object categories in 6202 natural-scene images, yielding $$\sim$$
∼
300,000 search fixations. We thoroughly characterize COCO-Search18, and benchmark it using three machine-learning methods: a ResNet50 object detector, a ResNet50 trained on fixation-density maps, and an inverse-reinforcement-learning model trained on behavioral search scanpaths. Models were also trained/tested on images transformed to approximate a foveated retina, a fundamental biological constraint. These models, each having a different reliance on behavioral training, collectively comprise the new state-of-the-art in predicting goal-directed search fixations. Our expectation is that future work using COCO-Search18 will far surpass these initial efforts, finding applications in domains ranging from human-computer interactive systems that can anticipate a person’s intent and render assistance to the potentially early identification of attention-related clinical disorders (ADHD, PTSD, phobia) based on deviation from neurotypical fixation behavior.
Publisher
Springer Science and Business Media LLC
Reference66 articles.
1. Itti, L. & Koch, C. Computational modelling of visual attention. Nat. Rev. Neurosci. 2(3), 194 (2001).
2. Itti, L., Koch, C. & Niebur, E. A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Anal. Mach. Intell. 11, 1254–1259 (1998).
3. Borji, A. & Itti, L. State-of-the-art in visual attention modeling. PAMI 35(1), 185–207 (2012).
4. Borji, A., Sihite, D. N. & Itti, L. Quantitative analysis of human-model agreement in visual saliency modeling: A comparative study. IEEE Trans. Image Process. 22(1), 55–69 (2012).
5. Harel, J., Koch, C. & Perona, P. Graph-based visual saliency. Adv. Neural Inf. Process. Syst. 20, 545–552 (2007).
Cited by
13 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Explaining Disagreement in Visual Question Answering Using Eye Tracking;Proceedings of the 2024 Symposium on Eye Tracking Research and Applications;2024-06-04
2. Oculomotor routines for perceptual judgments;Journal of Vision;2024-05-06
3. Visual ScanPath Transformer: Guiding Computers to See the World;2023 IEEE International Symposium on Mixed and Augmented Reality (ISMAR);2023-10-16
4. Modeling Visual Impairments with Artificial Neural Networks: a Review;2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW);2023-10-02
5. Oculomotor routines for perceptual judgments;2023-09-27