SiamUT: Siamese Unsymmetrical Transformer-like Tracking
-
Published:2023-07-19
Issue:14
Volume:12
Page:3133
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
Yang Lingyu1, Zhou Hao1, Yuan Guowu1ORCID, Xia Mengen1, Chen Dong1, Shi Zhiliang2, Chen Enbang2
Affiliation:
1. School of Information Science and Engineering, Yunnan University, Kunming 650504, China 2. Kunming Enersun Technology Co., Ltd., Kunming 650504, China
Abstract
Siamese networks have proven to be suitable for many computer vision tasks, including single object tracking. These trackers leverage the siamese structure to benefit from feature cross-correlation, which measures the similarity between a target template and the corresponding search region. However, the linear nature of the correlation operation leads to the loss of important semantic information and may result in suboptimal performance when faced with complex background interference or significant object deformations. In this paper, we introduce the Transformer structure, which has been successful in vision tasks, to enhance the siamese network’s performance in challenging conditions. By incorporating self-attention and cross-attention mechanisms, we modify the original Transformer into an asymmetrical version that can focus on different regions of the feature map. This transformer-like fusion network enables more efficient and effective fusion procedures. Additionally, we introduce a two-layer output structure with decoupling prediction heads, improved loss functions, and window penalty post-processing. This design enhances the performance of both the classification and the regression branches. Extensive experiments conducted on large public datasets such as LaSOT, GOT-10k, and TrackingNet demonstrate that our proposed SiamUT tracker achieves state-of-the-art precision performance on most benchmark datasets.
Funder
Special Fund for Key Program of Science and Technology of Yunnan Province, China
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference53 articles.
1. Object tracking benchmark;Wu;Proc. IEEE Trans. Pattern Anal. Mach. Intell.,2015 2. Zhang, X., Chen, J., Yuan, J., Chen, Q., Wang, J., Wang, X., Han, S., Chen, X., Pi, J., and Yao, K. (2022). Cae v2: Context autoencoder with clip target. arXiv, 3. 3. Li, B., Wu, W., Wang, Q., Zhang, F., Xing, J., and Yan, J. (2019). SiamRPN++: Evolution of siamese visual tracking with very deep networks. arXiv. 4. Xu, Y., Wang, Z., Li, Z., Yuan, Y., and Yu, G. (2020). SiamFC++: Towards robust and accurate visual tracking with target estimation guidelines. arXiv. 5. Zhang, Z., Peng, H., Fu, J., Li, B., and Hu, W. (2020, January 23–28). Ocean: Object-Aware Anchor-Free Tracking. Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK.
|
|