Affiliation:
1. Dalian Minzu University
2. Harbin Institute of Technology
Abstract
Abstract
Object trackers based on Siamese networks view tracking as a similarity-matching process. However, the correlation operation operates as a local linear matching process, limiting the tracker's ability to capture the intricate nonlinear relationship between the template and search region branches. Moreover, most trackers don't update the template, and often use the first frame of an image as the initial template, which will easily lead to poor tracking performance of the algorithm when facing instances of deformation, scale variation and occlusion of the tracking target. To this end, we propose a Simases tracking network with multi-attention mechanism, including a template branch and a search branch. To adapt to changes in target appearance, we integrate dynamic templates and multi-attention mechanism in the template branch to obtain more effective feature representation by fusing the features of initial templates and dynamic templates. To enhance the robustness of the tracking model, we utilize a multi-attention mechanism in the search branch that shares weights with the template branch to obtain multi-scale feature representation by fusing search region features at different scales. In addition, we design a lightweight and simple feature fusion mechanism, in which the Transformer encoder structure is utilized to fuse the information of the template area and search area, and the dynamic template is updated online based on confidence. Experimental results on publicly tracking datasets show that the proposed method achieves competitive results compared to several state-of-the-art trackers.
Publisher
Research Square Platform LLC
Reference41 articles.
1. You, Shaoze and Zhu, Hua and Li, Menggang and Li, Yutan (2019) A review of visual trackers and analysis of its application to mobile robot. arXiv preprint arXiv:1910.09761
2. Ciaparrone, Gioele and S{\'a}nchez, Francisco Luque and Tabik, Siham and Troiano, Luigi and Tagliaferri, Roberto and Herrera, Francisco (2020) Deep learning in video multi-object tracking: A survey. Neurocomputing 381: 61--88 Elsevier
3. Li, Peixia and Wang, Dong and Wang, Lijun and Lu, Huchuan (2018) Deep visual tracking: Review and experimental comparison. Pattern Recognition 76: 323--338 Elsevier
4. Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian (2015) Faster r-cnn: Towards real-time object detection with region proposal networks. Advances in neural information processing systems 28
5. Zhu, Zheng and Wang, Qiang and Li, Bo and Wu, Wei and Yan, Junjie and Hu, Weiming (2018) Distractor-aware siamese networks for visual object tracking. 101--117, Proceedings of the European conference on computer vision (ECCV)
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献