Affiliation:
1. College of Computer Science and Technology, Xi’an University of Science and Technology, Xi’an 710054, China
Abstract
The traditional Siamese object tracking algorithm uses a convolutional neural network as the backbone and has achieved good results in improving tracking precision. However, due to the lack of global information and the use of spatial and scale information, the accuracy and speed of such tracking algorithms still need to be improved in complex environments such as rapid motion and illumination variation. In response to the above problems, we propose SSTrack, an object tracking algorithm based on spatial scale attention. We use dilated convolution branch and covariance pooling to build a spatial scale attention module, which can extract the spatial and scale information of the target object. By embedding the spatial scale attention module into Swin Transformer as the backbone, the ability to extract local detailed information has been enhanced, and the success rate and precision of tracking have been improved. At the same time, to reduce the computational complexity of self-attention, Exemplar Transformer is applied to the encoder structure. SSTrack achieved 71.5% average overlap (AO), 86.7% normalized precision (NP), and 68.4% area under curve (AUC) scores on the GOT-10k, TrackingNet, and LaSOT. The tracking speed reached 28fps, which can meet the need for real-time object tracking.
Funder
National Key Research and Development Program of China
Reference43 articles.
1. Visual object tracking: A survey;Chen;Comput. Vis. Image Underst.,2022
2. Recent advances of single-object tracking methods: A brief survey;Zhang;Neurocomputing,2021
3. Intelligent Visual Surveillance: A Review;Huang;Chin. J. Comput.,2015
4. Liang, J., Jiang, L., and Niebles, J.C. (2019, January 15–20). Peeking into the future: Predicting future person activities and locations in videos. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.
5. Visualization of Cross-View Multi-Object Tracking for Surveillance Videos in Crossroad;Liu;Chin. J. Comput.,2018