Evolutionary Multi-objective Optimization for Contextual Adversarial Example Generation-Reference-Cited by-同舟云学术

Evolutionary Multi-objective Optimization for Contextual Adversarial Example Generation

Published:2024-07-12 Issue:FSE Volume:1 Page:2285-2308
ISSN:2994-970X
Container-title:Proceedings of the ACM on Software Engineering
language:en
Short-container-title:Proc. ACM Softw. Eng.

Author:

Zhou Shasha¹^ORCID,Huang Mingyu²^ORCID,Sun Yanan³^ORCID,Li Ke⁴^ORCID

Affiliation:

1. University of Electronic Science and Technology of China, Chengdu, China / University of Exeter, Exeter, United Kingdom

2. University of Electronic Science and Technology of China, Chengdu, China

3. Sichuan University, Chengdu, China

4. University of Exeter, Exeter, United Kingdom

Abstract

The emergence of the 'code naturalness' concept, which suggests that software code shares statistical properties with natural language, paves the way for deep neural networks (DNNs) in software engineering (SE). However, DNNs can be vulnerable to certain human imperceptible variations in the input, known as adversarial examples (AEs), which could lead to adverse model performance. Numerous attack strategies have been proposed to generate AEs in the context of computer vision and natural language processing, but the same is less true for source code of programming languages in SE. One of the challenges is derived from various constraints including syntactic, semantics and minimal modification ratio. These constraints, however, are subjective and can be conflicting with the purpose of fooling DNNs. This paper develops a multi-objective adversarial attack method (dubbed MOAA), a tailored NSGA-II, a powerful evolutionary multi-objective (EMO) algorithm, integrated with CodeT5 to generate high-quality AEs based on contextual information of the original code snippet. Experiments on 5 source code tasks with 10 datasets of 6 different programming languages show that our approach can generate a diverse set of high-quality AEs with promising transferability. In addition, using our AEs, for the first time, we provide insights into the internal behavior of pre-trained models.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3660808

Reference92 articles.

1. Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey

2. Miltiadis Allamanis, Earl T. Barr, Premkumar T. Devanbu, and Charles Sutton. 2018. A Survey of Machine Learning for Big Code and Naturalness. ACM Comput. Surv., 51, 4 (2018), 81:1–81:37.

3. Bander Alsulami, Edwin Dauber, Richard E. Harang, Spiros Mancoridis, and Rachel Greenstadt. 2017. Source Code Authorship Attribution Using Long Short-Term Memory Based Networks. In ESORICS’17: Proc. of the 22nd European Symposium on Research in Computer Security. 10492, 65–82.

4. HypE: An Algorithm for Fast Hypervolume-Based Many-Objective Optimization

5. Patrick Bareiß Beatriz Souza Marcelo d’Amorim and Michael Pradel. 2022. Code Generation Tools (Almost) for Free? A Study of Few-Shot Pre-Trained Language Models on Code. CoRR.