TFITrack: Transformer Feature Integration Network for Object Tracking-Reference-Cited by-同舟云学术

TFITrack: Transformer Feature Integration Network for Object Tracking

Published:2024-04-29 Issue:1 Volume:17 Page:
ISSN:1875-6883
Container-title:International Journal of Computational Intelligence Systems
language:en
Short-container-title:Int J Comput Intell Syst

Author:

Hu Xiuhua^ORCID,Liu Huan,Li Shuang,Zhao Jing,Hui Yan

Abstract

AbstractDue to the ignoring of rich spatio-temporal and global contextual information with convolutional neural networks in features extraction, the traditional method is prone to tracking drift or even failure in complex scenario, especially for the tiny targets in aerial photography scenario. In this work, it proposes a transformer feature integration network (TFITrack) to obtain diverse and comprehensive target feature for the robust object tracking. Based on the typical transformer architecture, it optimizes encoder and decoder structure for aggregating discriminative spatio-temporal information and global context-awareness feature. Furthermore, the encoder introduces the similarity calculation layer and dual-attention module; the aim is to deepen the similarity between features and make corrections for channel and spatial dimensions, and feature representation is improved. Finally, with the introduction of the temporal context filtering layer, unimportant feature information is ignored adaptively, obtaining a balance between the parameters number reduction and stable performance. Experimental results show that the proposed tracking algorithm exhibits excellent tracking performance on seven benchmark datasets, especially on the aerial dataset UAV123, UAV20L, and UAV123@10fps, which presents the advantages of the novel method in dealing with fast motion and external interference.

Funder

Natural Science Basic Research Program of Shaanxi Province

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s44196-024-00500-0.pdf

Reference60 articles.

1. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., Torr, P.H.: Fully-convolutional siamese networks for object tracking. In: Proceedings of the 2016 European Conference on Computer Vision, ECCV 2016, Amsterdam, The Netherlands, October 8–16, 2016, pp. 850–865 (2016).https://doi.org/10.48550/arXiv.1606.09549

2. Li, B., Yan, J., Wu, W., Zhu, Z., Hu, X.: High performance visual tracking with Siamese region proposal network. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt, Lake, City, UT, USA, June18–22, 2018, pp. 8971–8980 (2018). https://doi.org/10.1109/CVPR.2018.00935

3. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., Yan, J.: Siamrpn++: Evolution of Siamese visual tracking with very deep networks. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June16–20, 2019, pp. 4282–4291 (2019). https://doi.org/10.48550/arXiv.1812.11703

4. Zhang, Z., Peng, H.: Deeper and wider Siamese networks for real-time visual tracking. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2019, Long Beach, CA, USA, June 15–20, 2019, pp. 4591–4600 (2019). https://doi.org/10.1109/CVPR.2019.00472

5. Fan, H., Ling, H.: Siamese cascaded region proposal networks for real-time visual tracking. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2018, Long Beach, CA, USA, June 15–20, 2018, pp. 7952–7961 (2019). https://doi.org/10.1109/CVPR.2019.00814

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Security in Transformer Visual Trackers: A Case Study on the Adversarial Robustness of Two Models;Sensors;2024-07-22