Abstract
AbstractVisual tracking is an important field of computer vision research. Although transformer-based trackers have achieved remarkable performance, the transformer structure is globally computationally inefficient, it does not screen important patches, and it cannot focus on key target regions. At the same time, temporal motion features are easily overlooked. To solve these problems, this paper proposes a new method, SKRT, that removes the CNN structure and directly uses a transformer as the backbone network to extract multiframe video features. Then, these feature maps are mixed and superimposed to obtain spatiotemporal information. To focus on important parts efficiently, we use key region extraction to obtain a small set of template and search feature map patches and reinput them into the transformer as a cross-correlation computation. Finally, we predict the position of a tracking object through center-corner prediction. To demonstrate the effectiveness of our method, we conduct experiments on challenging benchmark datasets (GOT-10K, TrackingNet, VOT2018, OTB100, LaSOT), and the results show that SKRT is competitive with other state-of-the-art methods.
Publisher
Springer Science and Business Media LLC
Subject
Computational Mathematics,Engineering (miscellaneous),Information Systems,Artificial Intelligence
Reference60 articles.
1. Galoogahi HK, Fagg A, Lucey S (2017) Learning background-aware correlation filters for visual tracking. In International Conference on Computer Vision (ICCV)
2. Smeulders AW, Chu MD, Cucchiara R, Calderara S, Dehghan A (2013) Visual tracking: an experimental survey. IEEE Trans Pattern Anal Mach Intell 36(7):1442–1468
3. Zuo W, Wu X, Lin L, Zhang L, Yang MH (2018) Learning support correlation filters for visual tracking. IEEE Trans Pattern Anal Mach Intell 41(5):1158–1172
4. Alismail H, Browning B, Lucey S (2016) Robust tracking in low light and sudden illumination changes. In Fourth International Conference on 3d Vision(3DV), pages 389–398
5. Bolme DS, Beveridge JR, Draper BA, Lui YM (2010) Visual object tracking using adaptive correlation filters. In International Conference on Computer Vision and Pattern Recogintion (CVPR)
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献