Abstract
AbstractConvolutional neural networks (CNNs) have been the dominant architectures for feature extraction tasks, but CNNs do not look for and focus on some specific image features. Correlation operations play an important role in visual tracking. However, the correlation operation reserves a large amount of unfavorable background information. In this paper, we propose an effective feature recognizer including channel and spatial attention modules to focus on important object feature information. Thus, the representation power of the feature extraction network is improved. Further, we design a multi-scale feature fusion network. The fusion network performs feature fusion on template feature and encoded feature branches to establish connections between features at different scales. Experiments on six benchmarks demonstrate that the proposed tracker outperforms the state-of-the-art trackers. In particular, the proposed tracker achieves an 80.4% AUC on TrackingNet and a 68.4% AUC on GOT-10k while running at a real-time speed.
Funder
National Natural Science Foundation of China
Publisher
Springer Science and Business Media LLC