Central Attention with Sliding Window for Efficient Visual Tracking

Author:

Chen Zhen1,Xiao Xianbing1,Xiong Xingzhong1,Meng Fanqin1,Liu Jun1

Affiliation:

1. Sichuan University of Science and Engineering

Abstract

Abstract Cross-correlation is often used for feature fusion, especially in Siamese-based trackers. However, capturing complex nonlinear relationships is challenging and susceptible to outliers in the sample. Recently, researchers have used Transformers for feature fusion and achieved more significant performance. However, most rely on modeling global token relationships, which can destroy the local and spatial correlations inherent in 2D structures. This paper proposes an efficient tracking algorithm based on central attention and sliding window sampling called SiamCAT. Specifically, significant context augments with sliding windows are suggested to maintain the stability of the 2D input spatial structure. It is based on attention to simulate the processing of 2D data by convolution, and the internal memory composed of learnable parameters realizes the dynamic adjustment of the attention layer. Second, to learn efficient feature fusion, this paper constructs a feature fusion network to effectively combine template features and search features. Experiments show that SiamCAT achieves state-of-the-art results on LaSOT, OTB100, NFS, UAV123, GOT10K, and TrackingNet benchmark and runs in real-time at 47 frames per second on the CPU. The code will be released in https://github.com/cnchange/SiamCAT.

Publisher

Research Square Platform LLC

Reference103 articles.

1. Bertinetto, Luca and Valmadre, Jack and Henriques, Jo{\ a}o F. and Vedaldi, Andrea and Torr, Philip H. S. (2016) Fully-{{Convolutional Siamese Networks}} for {{Object Tracking}}. {Springer International Publishing}, {Cham}, D:\Project_Files\Zotero\storage\F4W539S7\Bertinetto 等 - 2016 - Fully-Convolutional Siamese Networks for Object Tr.pdf, Deep-learning,Object-tracking,Siamese-network,siamfc,Similarity-learning, english, 978-3-319-48881-3, The problem of arbitrary object tracking has traditionally been tackled by learning a model of the object's appearance exclusively online, using as sole training data the video itself. Despite the success of these methods, their online-only approach inherently limits the richness of the model they can learn. Recently, several attempts have been made to exploit the expressive power of deep convolutional networks. However, when the object to track is not known beforehand, it is necessary to perform Stochastic Gradient Descent online to adapt the weights of the network, severely compromising the speed of the system. In this paper we equip a basic tracking algorithm with a novel fully-convolutional Siamese network trained end-to-end on the ILSVRC15 dataset for object detection in video. Our tracker operates at frame-rates beyond real-time and, despite its extreme simplicity, achieves state-of-the-art performance in multiple benchmarks., 10.1007/978-3-319-48881-3_56, 850--865, Lecture {{Notes}} in {{Computer Science}}, Hua, Gang and J{\'e}gou, Herv{\'e}, Computer {{Vision}} \textendash{} {{ECCV}} 2016 {{Workshops}}

2. Bhat, Goutam and Danelljan, Martin and Van Gool, Luc and Timofte, Radu (2020) Know Your Surroundings: {{Exploiting}} Scene Information for Object Tracking. {Springer}, D\:\\Project_Files\\Zotero\\storage\\FB86937J\\Bhat 等 - 2020 - Know your surroundings Exploiting scene informati.pdf;D\:\\Project_Files\\Zotero\\storage\\S4AVR9DL\\978-3-030-58592-1_13.html, 205--221, Computer {{Vision}}\textendash{{ECCV}} 2020: 16th {{European Conference}}, {{Glasgow}}, {{UK}}, {{August}} 23\textendash 28, 2020, {{Proceedings}}, {{Part XXIII}} 16, Know Your Surroundings

3. Bhat, Goutam and Danelljan, Martin and Gool, Luc Van and Timofte, Radu (2019) Learning {{Discriminative Model Prediction}} for {{Tracking}}. D:\Project_Files\Zotero\storage\R67JRDJE\Bhat 等 - 2019 - Learning Discriminative Model Prediction for Track.pdf, dimp-50, 2023-07-17, 6182--6191, Proceedings of the {{IEEE}}/{{CVF International Conference}} on {{Computer Vision}}

4. Bhat, Goutam and Danelljan, Martin and Gool, Luc Van and Timofte, Radu (2019) Learning Discriminative Model Prediction for Tracking. D\:\\Project_Files\\Zotero\\storage\\WD3YB3CU\\Bhat 等 - 2019 - Learning discriminative model prediction for track.pdf;D\:\\Project_Files\\Zotero\\storage\\B86BAQKV\\Bhat_Learning_Discriminative_Model_Prediction_for_Tracking_ICCV_2019_paper.html, dimp, 6182--6191, Proceedings of the {{IEEE}}/{{CVF}} International Conference on Computer Vision

5. Blatter, Philippe and Kanakis, Menelaos and Danelljan, Martin and Van Gool, Luc (2023) Efficient {{Visual Tracking With Exemplar Transformers}}. D:\Project_Files\Zotero\storage\XREVU329\Blatter 等 - 2023 - Efficient Visual Tracking With Exemplar Transforme.pdf, E.T.track, english, 2023-07-17, 1571--1581, Proceedings of the {{IEEE}}/{{CVF Winter Conference}} on {{Applications}} of {{Computer Vision}}

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3