CBAs: Character-level Backdoor Attacks against Chinese Pre-trained Language Models

Author:

He Xinyu1ORCID,Hao Fengrui2ORCID,Gu Tianlong2ORCID,Chang Liang1ORCID

Affiliation:

1. Guilin University of Electronic Technology, Guilin, China

2. Jinan University, Guangzhou, China

Abstract

Pre-trained language models (PLMs) aim to assist computers in various domains to provide natural and efficient language interaction and text processing capabilities. However, recent studies have shown that PLMs are highly vulnerable to malicious backdoor attacks, where triggers could be injected into the models to guide them to exhibit the expected behavior of the attackers. Unfortunately, existing research on backdoor attacks has mainly focused on English PLMs and paid less attention to Chinese PLMs. Moreover, these extant backdoor attacks do not work well against Chinese PLMs. In this article, we disclose the limitations of English backdoor attacks against Chinese PLMs, and propose the character-level backdoor attacks (CBAs) against the Chinese PLMs. Specifically, we first design three Chinese trigger generation strategies to ensure that the backdoor is effectively triggered while improving the effectiveness of the backdoor attacks. Then, based on the attacker’s capabilities of accessing the training dataset, we develop trigger injection mechanisms with either the target label similarity or the masked language model, which select the most influential position and insert the trigger to maximize the stealth of backdoor attacks. Extensive experiments on three major natural language processing tasks in various Chinese PLMs and English PLMs demonstrate the effectiveness and stealthiness of our method. In addition, CBAs have very strong resistance against three state-of-the-art backdoor defense methods. 1

Funder

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Reference48 articles.

1. Efficient history-driven adversarial perturbation distribution learning in low frequency domain;Cao Han;ACM Transactions on Privacy and Security,2024

2. Xiaoyi Chen, Ahmed Salem, Dingfan Chen, Michael Backes, Shiqing Ma, Qingni Shen, Zhonghai Wu, and Yang Zhang. 2021. BadNL: Backdoor attacks against NLP models with semantic-preserving improvements. In Proceedings of the Annual Computer Security Applications Conference. 554–569.

3. Backdoor attacks and countermeasures in natural language processing models: A comprehensive security review;Cheng Pengzhou;arXiv preprint arXiv:2309.06055,2023

4. Zhenyi Lu Chenghao Fan and Jie Tian. 2023. Chinese-Vicuna: A Chinese Instruction-Following LLaMA-Based Model. Retrieved July 17 2024 from https://github.com/Facico/Chinese-Vicuna

5. Electra: Pre-training text encoders as discriminators rather than generators;Clark Kevin;arXiv preprint arXiv:2003.10555,2020

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3