CARL: Unsupervised Code-Based Adversarial Attacks for Programming Language Models via Reinforcement Learning-Reference-Cited by-同舟云学术

CARL: Unsupervised Code-Based Adversarial Attacks for Programming Language Models via Reinforcement Learning

Published:2024-08-14 Issue: Volume: Page:
ISSN:1049-331X
Container-title:ACM Transactions on Software Engineering and Methodology
language:en
Short-container-title:ACM Trans. Softw. Eng. Methodol.

Author:

Yao Kaichun¹^ORCID,Wang Hao²^ORCID,Qin Chuan³^ORCID,Zhu Hengshu⁴^ORCID,Wu Yanjun¹^ORCID,Zhang Libo⁵^ORCID

Affiliation:

1. Institute of Software, Chinese Academy of Sciences, China

2. School of Computer Science, University of Chinese Academy of Sciences, China

3. PBC School of Finance, Tsinghua University, China

4. Computer Network Information Center, Chinese Academy of Sciences, China

5. Institute of Software Chinese Academy of Sciences, China

Abstract

Code based adversarial attacks play a crucial role in revealing vulnerabilities of software system. Recently, pre-trained programming language models (PLMs) have demonstrated remarkable success in various significant software engineering tasks, progressively transforming the paradigm of software development. Despite their impressive capabilities, these powerful models are vulnerable to adversarial attacks. Therefore, it is necessary to carefully investigate the robustness and vulnerabilities of the PLMs by means of adversarial attacks. Adversarial attacks entail imperceptible input modifications that cause target models to make incorrect predictions. Existing approaches for attacking PLMs often employ either identifier renaming or the greedy algorithm, which may yield sub-optimal performance or lead to high inference times. In response to these limitations, we propose CARL, an unsupervised black-box attack model that leverages reinforcement learning to generate imperceptible adversarial examples. Specifically, CARL comprises a programming language encoder and a perturbation prediction layer. In order to achieve more effective and efficient attack, we cast the task as a sequence decision-making process, optimizing through policy gradient with a suite of reward functions. We conduct extensive experiments to validate the effectiveness of CARL on code summarization, code translation, and code refinement tasks, covering various programming languages and PLMs. The experimental results demonstrate that CARL surpasses state-of-the-art code attack models, achieving the highest attack success rate across multiple tasks and PLMs while maintaining high attack efficiency, imperceptibility, consistency, and fluency.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3688839

Reference61 articles.

1. Unified Pre-training for Program Understanding and Generation

2. Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, et al. 2023. SantaCoder: don’t reach for the stars! arXiv preprint arXiv:2301.03988 (2023).

3. Generating Natural Language Adversarial Examples

4. Leonhard Applis, Annibale Panichella, and Arie van Deursen. 2021. Assessing robustness of ML-based program analysis tools using metamorphic program transformations. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1377–1381.

5. Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, et al. 2021. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374 (2021).