CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language Models-Reference-Cited by-同舟云学术

CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language Models

Published:2024-08-16 Issue:3 Volume:27 Page:1-26
ISSN:2471-2566
Container-title:ACM Transactions on Privacy and Security
language:en
Short-container-title:ACM Trans. Priv. Secur.

Author:

He Xinyu¹^ORCID,Hao Fengrui²^ORCID,Gu Tianlong²^ORCID,Chang Liang¹^ORCID

Affiliation:

1. Guilin University of Electronic Technology, Guilin, China

2. Jinan University, Guangzhou, China

Abstract

Pre-trained language models (PLMs) aim to assist computers in various domains to provide natural and efficient language interaction and text processing capabilities. However, recent studies have shown that PLMs are highly vulnerable to malicious backdoor attacks, where triggers could be injected into the models to guide them to exhibit the expected behavior of the attackers. Unfortunately, existing research on backdoor attacks has mainly focused on English PLMs and paid less attention to Chinese PLMs. Moreover, these extant backdoor attacks do not work well against Chinese PLMs. In this article, we disclose the limitations of English backdoor attacks against Chinese PLMs, and propose the character-level backdoor attacks (CBAs) against the Chinese PLMs. Specifically, we first design three Chinese trigger generation strategies to ensure that the backdoor is effectively triggered while improving the effectiveness of the backdoor attacks. Then, based on the attacker’s capabilities of accessing the training dataset, we develop trigger injection mechanisms with either the target label similarity or the masked language model, which select the most influential position and insert the trigger to maximize the stealth of backdoor attacks. Extensive experiments on three major natural language processing tasks in various Chinese PLMs and English PLMs demonstrate the effectiveness and stealthiness of our method. In addition, CBAs have very strong resistance against three state-of-the-art backdoor defense methods. 1

Funder

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3678007

Reference48 articles.

1. Efficient history-driven adversarial perturbation distribution learning in low frequency domain;Cao Han;ACM Transactions on Privacy and Security,2024

2. Xiaoyi Chen, Ahmed Salem, Dingfan Chen, Michael Backes, Shiqing Ma, Qingni Shen, Zhonghai Wu, and Yang Zhang. 2021. BadNL: Backdoor attacks against NLP models with semantic-preserving improvements. In Proceedings of the Annual Computer Security Applications Conference. 554–569.

3. Backdoor attacks and countermeasures in natural language processing models: A comprehensive security review;Cheng Pengzhou;arXiv preprint arXiv:2309.06055,2023

4. Zhenyi Lu Chenghao Fan and Jie Tian. 2023. Chinese-Vicuna: A Chinese Instruction-Following LLaMA-Based Model. Retrieved July 17 2024 from https://github.com/Facico/Chinese-Vicuna

5. Electra: Pre-training text encoders as discriminators rather than generators;Clark Kevin;arXiv preprint arXiv:2003.10555,2020