SiamHSFT: A Siamese Network-Based Tracker with Hierarchical Sparse Fusion and Transformer for UAV Tracking
Author:
Hu Xiuhua12, Zhao Jing12, Hui Yan12, Li Shuang12, You Shijie12
Affiliation:
1. School of Computer Science and Engineering, Xi’an Technological University, Xi’an 710021, China 2. State and Provincial Joint Engineering Laboratory of Advanced Network, Monitoring and Control, Xi’an 710021, China
Abstract
Due to high maneuverability as well as hardware limitations of Unmanned Aerial Vehicle (UAV) platforms, tracking targets in UAV views often encounter challenges such as low resolution, fast motion, and background interference, which make it difficult to strike a compatibility between performance and efficiency. Based on the Siamese network framework, this paper proposes a novel UAV tracking algorithm, SiamHSFT, aiming to achieve a balance between tracking robustness and real-time computation. Firstly, by combining CBAM attention and downward information interaction in the feature enhancement module, the provided method merges high-level and low-level feature maps to prevent the loss of information when dealing with small targets. Secondly, it focuses on both long and short spatial intervals within the affinity in the interlaced sparse attention module, thereby enhancing the utilization of global context and prioritizing crucial information in feature extraction. Lastly, the Transformer’s encoder is optimized with a modulation enhancement layer, which integrates triplet attention to enhance inter-layer dependencies and improve target discrimination. Experimental results demonstrate SiamHSFT’s excellent performance across diverse datasets, including UAV123, UAV20L, UAV123@10fps, and DTB70. Notably, it performs better in fast motion and dynamic blurring scenarios. Meanwhile, it maintains an average tracking speed of 126.7 fps across all datasets, meeting real-time tracking requirements.
Funder
Natural Science Basic Research Project of the Shaanxi Provincial Department of Science and Technology
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference66 articles.
1. Held, D., Thrun, S., and Savarese, S. (2016, January 8–16). Learning to track at 100 fps with deep regression networks. Proceedings of the European Conference Computer Vision (ECCV), Amsterdam, The Netherlands. 2. Tao, R., Gavves, E., and Smeulders, A.W.M. (2016, January 27–30). Siamese instance search for tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA. 3. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., and Torr, P.H. (2016, January 8–16). Fully-convolutional siamese networks for object tracking. Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands. 4. Li, B., Yan, J., Wu, W., Zhu, Z., and Hu, X. (2018, January 18–22). High performance visual tracking with siamese region proposal network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA. 5. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., and Yan, J. (2019, January 16–20). Siamrpn++: Evolution of siamese visual tracking with very deep networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.
|
|