Abstract
AbstractVisual tracking of generic objects is one of the fundamental but challenging problems in computer vision. Here, we propose a novel fully convolutional Siamese network to solve visual tracking by directly predicting the target bounding box in an end-to-end manner. We first reformulate the visual tracking task as two subproblems: a classification problem for pixel category prediction and a regression task for object status estimation at this pixel. With this decomposition, we design a simple yet effective Siamese architecture based classification and regression framework, termed SiamCAR, which consists of two subnetworks: a Siamese subnetwork for feature extraction and a classification-regression subnetwork for direct bounding box prediction. Since the proposed framework is both proposal- and anchor-free, SiamCAR can avoid the tedious hyper-parameter tuning of anchors, considerably simplifying the training. To demonstrate that a much simpler tracking framework can achieve superior tracking results, we conduct extensive experiments and comparisons with state-of-the-art trackers on a few challenging benchmarks. Without bells and whistles, SiamCAR achieves leading performance with a real-time speed. Furthermore, the ablation study validates that the proposed framework is effective with various backbone networks, and can benefit from deeper networks. Code is available at https://github.com/ohhhyeahhh/SiamCAR.
Funder
National Natural Science Foundation of China
Natural Science Foundation of Jiangsu Province
Publisher
Springer Science and Business Media LLC
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Software
Reference69 articles.
1. Bertinetto, L., Valmadre, J., Henriques, J., Vedaldi, A., & Torr, P. (2016). Fully-convolutional siamese networks for object tracking. In Proceedings of European conference on computer vision.
2. Bhat, G., Danelljan, M., Gool, L., & Timofte, R. (2019a). Learning discriminative model prediction for tracking. In Proceedings of IEEE international conference on computer vision (pp. 6182–6191).
3. Bhat, G., Danelljan, M., Gool, L. V., & Timofte, R. (2019b). Learning discriminative model prediction for tracking. In Proceedings of the IEEE/CVF international conference on computer vision.
4. Bolme, D., Beveridge, J., Draper, B., & Lui, Y. (2010). Visual object tracking using adaptive correlation filters. In Proceedings of IEEE conference on computer vision and pattern recognition.
5. Dai, L., Li, Y., He, K., & Sun, J. (2016). R-fcn: Object detection via region-based fully convolutional networks. In Proceedings of advances in neural information processing systems (pp. 379–387).
Cited by
25 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献