Self-Prompting Tracking: A Fast and Efficient Tracking Pipeline for UAV Videos
-
Published:2024-02-21
Issue:5
Volume:16
Page:748
-
ISSN:2072-4292
-
Container-title:Remote Sensing
-
language:en
-
Short-container-title:Remote Sensing
Author:
Wang Zhixing12345, Zhou Gaofan13, Yao Jinzhen13, Zhang Jianlin13ORCID, Bao Qiliang13, Hu Qintao13
Affiliation:
1. Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu 610209, China 2. School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China 3. Key Laboratory of Optical Engineering, Chinese Academy of Sciences, Chengdu 610209, China 4. School of Electronic, Electrical and Communication Engineering, Chinese Academy of Sciences, Beijing 100049, China 5. National Key Laboratory of Optical Field Manipulation Science and Technology, Chinese Academy of Sciences, Chengdu 610209, China
Abstract
In the realm of visual tracking, remote sensing videos captured by Unmanned Aerial Vehicles (UAVs) have seen significant advancements with wide applications. However, there remain challenges to conventional Transformer-based trackers in balancing tracking accuracy and inference speed. This problem is further exacerbated when Transformers are extensively implemented at larger model scales. To address this challenge, we present a fast and efficient UAV tracking framework, denoted as SiamPT, aiming to reduce the number of Transformer layers without losing the discriminative ability of the model. To realize it, we transfer the conventional prompting theories in multi-model tracking into UAV tracking, where a novel self-prompting method is proposed by utilizing the target’s inherent characteristics in the search branch to discriminate targets from the background. Specifically, a self-distribution strategy is introduced to capture feature-level relationships, which segment tokens into distinct smaller patches. Subsequently, salient tokens within the full attention map are identified as foreground targets, enabling the fusion of local region information. These fused tokens serve as prompters to enhance the identification of distractors, thereby avoiding the demand for model expansion. SiamPT has demonstrated impressive results on the UAV123 benchmark, achieving success and precision rates of 0.694 and 0.890 respectively, while maintaining an inference speed of 91.0 FPS.
Reference47 articles.
1. Choi, J., Yeum, C.M., Dyke, S.J., and Jahanshahi, M.R. (2018). Computer-aided approach for rapid post-event visual evaluation of a building façade. Sensors, 18. 2. Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., and Keutzer, K. (2016). SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv. 3. He, K., Zhang, X., Ren, S., and Sun, J. (July, January 26). Deep residual learning for image recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA. 4. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., and Torr, P.H. (15–16, January 8–10). Fully-convolutional siamese networks for object tracking. Proceedings of the Computer Vision–ECCV 2016 Workshops, Amsterdam, The Netherlands. 5. Li, B., Yan, J., Wu, W., Zhu, Z., and Hu, X. (2018, January 18–23). High performance visual tracking with siamese region proposal network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|