Security in Transformer Visual Trackers: A Case Study on the Adversarial Robustness of Two Models-Reference-Cited by-同舟云学术

Security in Transformer Visual Trackers: A Case Study on the Adversarial Robustness of Two Models

Published:2024-07-22 Issue:14 Volume:24 Page:4761
ISSN:1424-8220
Container-title:Sensors
language:en
Short-container-title:Sensors

Author:

Ye Peng¹²³,Chen Yuanfang¹²^ORCID,Ma Sihang¹²,Xue Feng⁴,Crespi Noel⁵^ORCID,Chen Xiaohan¹²,Fang Xing¹²

Affiliation:

1. School of Cyberspace, Hangzhou Dianzi University, Hangzhou 310018, China

2. Key Laboratory of Discrete Industrial Internet of Things of Zhejiang, Hangzhou 310018, China

3. DBAPPSecurity Co., Ltd., Hangzhou 310051, China

4. ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou 310058, China

5. Institut Polytechnique de Paris, Institut Mines-Telecom, 91120 Paris, France

Abstract

Visual object tracking is an important technology in camera-based sensor networks, which has a wide range of practicability in auto-drive systems. A transformer is a deep learning model that adopts the mechanism of self-attention, and it differentially weights the significance of each part of the input data. It has been widely applied in the field of visual tracking. Unfortunately, the security of the transformer model is unclear. It causes such transformer-based applications to be exposed to security threats. In this work, the security of the transformer model was investigated with an important component of autonomous driving, i.e., visual tracking. Such deep-learning-based visual tracking is vulnerable to adversarial attacks, and thus, adversarial attacks were implemented as the security threats to conduct the investigation. First, adversarial examples were generated on top of video sequences to degrade the tracking performance, and the frame-by-frame temporal motion was taken into consideration when generating perturbations over the depicted tracking results. Then, the influence of perturbations on performance was sequentially investigated and analyzed. Finally, numerous experiments on OTB100, VOT2018, and GOT-10k data sets demonstrated that the executed adversarial examples were effective on the performance drops of the transformer-based visual tracking. White-box attacks showed the highest effectiveness, where the attack success rates exceeded 90% against transformer-based trackers.

Funder

Department of Science and Technology of Zhejiang Province

Publisher

MDPI AG

Link

https://www.mdpi.com/1424-8220/24/14/4761/pdf

Reference38 articles.

1. Buehler, M., Iagnemma, K., and Singh, S. (2009). The DARPA Urban Challenge: Autonomous Vehicles in City Traffic, Springer Science & Business Media.

2. Bertinetto, L., Valmadre, J., Henriques, J.F., Vedaldi, A., and Torr, P.H. (15–16, January 8–10). Fully-convolutional siamese networks for object tracking. Proceedings of the Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands. Proceedings, Part II 14.

3. Li, B., Yan, J., Wu, W., Zhu, Z., and Hu, X. (2018, January 18–23). High performance visual tracking with siamese region proposal network. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA.

4. Wang, Q., Zhang, L., Bertinetto, L., Hu, W., and Torr, P.H. (2019, January 15–20). Fast online object tracking and segmentation: A unifying approach. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA.

5. Nam, H., and Han, B. (July, January 26). Learning multi-domain convolutional neural networks for visual tracking. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA.