Transferability of features for neural networks links to adversarial attacks and defences-Reference-Cited by-同舟云学术

Transferability of features for neural networks links to adversarial attacks and defences

Published:2022-04-27 Issue:4 Volume:17 Page:e0266060
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Kotyan Shashank^ORCID,Matsuki Moe,Vargas Danilo Vasconcellos

Abstract

The reason for the existence of adversarial samples is still barely understood. Here, we explore the transferability of learned features to Out-of-Distribution (OoD) classes. We do this by assessing neural networks’ capability to encode the existing features, revealing an intriguing connection with adversarial attacks and defences. The principal idea is that, “if an algorithm learns rich features, such features should represent Out-of-Distribution classes as a combination of previously learned In-Distribution (ID) classes”. This is because OoD classes usually share several regular features with ID classes, given that the features learned are general enough. We further introduce two metrics to assess the transferred features representing OoD classes. One is based on inter-cluster validation techniques, while the other captures the influence of a class over learned features. Experiments suggest that several adversarial defences decrease the attack accuracy of some attacks and improve the transferability-of-features as measured by our metrics. Experiments also reveal a relationship between the proposed metrics and adversarial attacks (a high Pearson correlation coefficient and low p-value). Further, statistical tests suggest that several adversarial defences, in general, significantly improve transferability. Our tests suggests that models having a higher transferability-of-features have generally higher robustness against adversarial attacks. Thus, the experiments suggest that the objectives of adversarial machine learning might be much closer to domain transfer learning, as previously thought.

Funder

Japan Science and Technology Agency

Japan Society for the Promotion of Science

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference69 articles.

1. Szegedy Cea. Intriguing properties of neural networks. In: In ICLR. Citeseer; 2014.

2. Nguyen A, Yosinski J, Clune J. Deep neural networks are easily fooled: High confidence predictions for unrecognizable images. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; 2015. p. 427–436.

3. Brown TB, Mané D, Roy A, Abadi M, Gilmer J. Adversarial patch. arXiv preprint arXiv:171209665. 2017.

4. Moosavi-Dezfooli SM, Fawzi A, Fawzi O, Frossard P. Universal adversarial perturbations. In: Proceedings of the IEEE conference on computer vision and pattern recognition. Ieee; 2017. p. 1765–1773.

5. One pixel attack for fooling deep neural networks;J Su;IEEE Transactions on Evolutionary Computation,2019

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MCGCL:Adversarial attack on graph contrastive learning based on momentum gradient candidates;PLOS ONE;2024-06-06