On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification-Reference-Cited by-同舟云学术

On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification

Published:2020-11-14 Issue:22 Volume:10 Page:8079
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Park Sanglee^ORCID,So Jungmin^ORCID

Abstract

State-of-the-art neural network models are actively used in various fields, but it is well-known that they are vulnerable to adversarial example attacks. Throughout the efforts to make the models robust against adversarial example attacks, it has been found to be a very difficult task. While many defense approaches were shown to be not effective, adversarial training remains as one of the promising methods. In adversarial training, the training data are augmented by “adversarial” samples generated using an attack algorithm. If the attacker uses a similar attack algorithm to generate adversarial examples, the adversarially trained network can be quite robust to the attack. However, there are numerous ways of creating adversarial examples, and the defender does not know what algorithm the attacker may use. A natural question is: Can we use adversarial training to train a model robust to multiple types of attack? Previous work have shown that, when a network is trained with adversarial examples generated from multiple attack methods, the network is still vulnerable to white-box attacks where the attacker has complete access to the model parameters. In this paper, we study this question in the context of black-box attacks, which can be a more realistic assumption for practical applications. Experiments with the MNIST dataset show that adversarially training a network with an attack method helps defending against that particular attack method, but has limited effect for other attack methods. In addition, even if the defender trains a network with multiple types of adversarial examples and the attacker attacks with one of the methods, the network could lose accuracy to the attack if the attacker uses a different data augmentation strategy on the target network. These results show that it is very difficult to make a robust network using adversarial training, even for black-box settings where the attacker has restricted information on the target network.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/10/22/8079/pdf

Reference36 articles.

1. ImageNet classification with deep convolutional neural networks

2. Intriguing properties of neural networks;Szegedy;arXiv,2014

Cited by 20 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An AI-driven solution to prevent adversarial attacks on mobile Vehicle-to-Microgrid services;Simulation Modelling Practice and Theory;2024-12

2. An approach to improve transferability of adversarial examples;Physical Communication;2024-06

3. Adversarial Attacks and Defenses in Deep Learning-Based Computer Vision Systems;2024 IEEE 4th International Conference on Electronic Technology, Communication and Information (ICETCI);2024-05-24

4. Evaluating the Robustness of Deep Learning Models against Adversarial Attacks: An Analysis with FGSM, PGD and CW;Big Data and Cognitive Computing;2024-01-16

5. Toward Universal Detection of Adversarial Examples via Pseudorandom Classifiers;IEEE Transactions on Information Forensics and Security;2024