Data Stealing Attacks against Large Language Models via Backdooring
-
Published:2024-07-19
Issue:14
Volume:13
Page:2858
-
ISSN:2079-9292
-
Container-title:Electronics
-
language:en
-
Short-container-title:Electronics
Author:
He Jiaming1, Hou Guanyu1, Jia Xinyue1, Chen Yangyang1, Liao Wenqi1, Zhou Yinhang2, Zhou Rang1
Affiliation:
1. College of Computer Science and Cyber Security (Oxford Brookes College), Chengdu University of Technology, Chengdu 610059, China 2. Software College, Shenyang Normal University, Shenyang 110034, China
Abstract
Large language models (LLMs) have gained immense attention and are being increasingly applied in various domains. However, this technological leap forward poses serious security and privacy concerns. This paper explores a novel approach to data stealing attacks by introducing an adaptive method to extract private training data from pre-trained LLMs via backdooring. Our method mainly focuses on the scenario of model customization and is conducted in two phases, including backdoor training and backdoor activation, which allow for the extraction of private information without prior knowledge of the model’s architecture or training data. During the model customization stage, attackers inject the backdoor into the pre-trained LLM by poisoning a small ratio of the training dataset. During the inference stage, attackers can extract private information from the third-party knowledge database by incorporating the pre-defined backdoor trigger. Our method leverages the customization process of LLMs, injecting a stealthy backdoor that can be triggered after deployment to retrieve private data. We demonstrate the effectiveness of our proposed attack through extensive experiments, achieving a notable attack success rate. Extensive experiments demonstrate the effectiveness of our stealing attack in popular LLM architectures, as well as stealthiness during normal inference.
Reference33 articles.
1. Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., and Anadkat, S. (2023). Gpt-4 technical report. arXiv. 2. Palm: Scaling language modeling with pathways;Chowdhery;J. Mach. Learn. Res.,2023 3. Li, Z., Fan, S., Gu, Y., Li, X., Duan, Z., Dong, B., Liu, N., and Wang, J. (2024, January 20–27). FlexKBQA: A flexible LLM-powered framework for few-shot knowledge base question answering. Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada. 4. Yang, H., Zhang, M., Tao, S., Wang, M., Wei, D., and Jiang, Y. (2024, January 4–7). Knowledge-prompted estimator: A novel approach to explainable machine translation assessment. Proceedings of the 2024 26th International Conference on Advanced Communications Technology (ICACT), Pyeong Chang, Republic of Korea. 5. Zhang, M., Tu, M., Zhang, F., and Liu, S. (2024, January 14–19). A Cross Search Method for Data Augmentation in Neural Machine Translation. Proceedings of the ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea.
|
|