Joint Classification and Regression for Visual Tracking with Fully Convolutional Siamese Networks-Reference-Cited by-同舟云学术

Joint Classification and Regression for Visual Tracking with Fully Convolutional Siamese Networks

Published:2022-01-06 Issue:2 Volume:130 Page:550-566
ISSN:0920-5691
Container-title:International Journal of Computer Vision
language:en
Short-container-title:Int J Comput Vis

Author:

Cui Ying,Guo Dongyan^ORCID,Shao Yanyan,Wang Zhenhua,Shen Chunhua,Zhang Liyan,Chen Shengyong

Abstract

AbstractVisual tracking of generic objects is one of the fundamental but challenging problems in computer vision. Here, we propose a novel fully convolutional Siamese network to solve visual tracking by directly predicting the target bounding box in an end-to-end manner. We first reformulate the visual tracking task as two subproblems: a classification problem for pixel category prediction and a regression task for object status estimation at this pixel. With this decomposition, we design a simple yet effective Siamese architecture based classification and regression framework, termed SiamCAR, which consists of two subnetworks: a Siamese subnetwork for feature extraction and a classification-regression subnetwork for direct bounding box prediction. Since the proposed framework is both proposal- and anchor-free, SiamCAR can avoid the tedious hyper-parameter tuning of anchors, considerably simplifying the training. To demonstrate that a much simpler tracking framework can achieve superior tracking results, we conduct extensive experiments and comparisons with state-of-the-art trackers on a few challenging benchmarks. Without bells and whistles, SiamCAR achieves leading performance with a real-time speed. Furthermore, the ablation study validates that the proposed framework is effective with various backbone networks, and can benefit from deeper networks. Code is available at https://github.com/ohhhyeahhh/SiamCAR.

Funder

National Natural Science Foundation of China

Natural Science Foundation of Jiangsu Province

Publisher

Springer Science and Business Media LLC

Subject

Artificial Intelligence,Computer Vision and Pattern Recognition,Software

Link

https://link.springer.com/content/pdf/10.1007/s11263-021-01559-4.pdf

Reference69 articles.

1. Bertinetto, L., Valmadre, J., Henriques, J., Vedaldi, A., & Torr, P. (2016). Fully-convolutional siamese networks for object tracking. In Proceedings of European conference on computer vision.

2. Bhat, G., Danelljan, M., Gool, L., & Timofte, R. (2019a). Learning discriminative model prediction for tracking. In Proceedings of IEEE international conference on computer vision (pp. 6182–6191).

3. Bhat, G., Danelljan, M., Gool, L. V., & Timofte, R. (2019b). Learning discriminative model prediction for tracking. In Proceedings of the IEEE/CVF international conference on computer vision.

4. Bolme, D., Beveridge, J., Draper, B., & Lui, Y. (2010). Visual object tracking using adaptive correlation filters. In Proceedings of IEEE conference on computer vision and pattern recognition.

5. Dai, L., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. In Proceedings of advances in neural information processing systems (pp. 379–387).

Cited by 25 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SSTtrack: A unified hyperspectral video tracking framework via modeling spectral-spatial-temporal conditions;Information Fusion;2025-02

2. SENSE: Hyperspectral video object tracker via fusing material and motion cues;Information Fusion;2024-09

3. Propagating prior information with transformer for robust visual object tracking;Multimedia Systems;2024-08-13

4. Leveraging small-scale datasets for additive manufacturing process modeling and part certification: Current practice and remaining gaps;Journal of Manufacturing Systems;2024-08

5. Joint Spatio-Temporal Similarity and Discrimination Learning for Visual Tracking;IEEE Transactions on Circuits and Systems for Video Technology;2024-08