Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection-Reference-Cited by-同舟云学术

Not What You've Signed Up For: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection

Published:2023-11-26 Issue: Volume: Page:
ISSN:
Container-title:Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security
language:
Short-container-title:

Author:

Greshake Kai¹^ORCID,Abdelnabi Sahar²^ORCID,Mishra Shailesh¹^ORCID,Endres Christoph³^ORCID,Holz Thorsten²^ORCID,Fritz Mario²^ORCID

Affiliation:

1. Saarland University, Saarbrücken, Germany

2. CISPA Helmholtz Center for Information Security, Saarbrücken, Germany

3. sequire technology GmbH, Saarbrücken, Germany

Funder

European Union

Publisher

ACM

Link

https://dl.acm.org/doi/pdf/10.1145/3605764.3623985

Reference71 articles.

1. Alex Albert. 2023. Jailbreak Chat. hrefhttps://www.jailbreakchat.com/[Link] . Alex Albert. 2023. Jailbreak Chat. hrefhttps://www.jailbreakchat.com/[Link] .

2. Jacob Andreas. 2022. Language models as agent models. In Findings of EMNLP. Jacob Andreas. 2022. Language models as agent models. In Findings of EMNLP.

3. Giovanni Apruzzese Hyrum Anderson Savino Dambra David Freeman Fabio Pierazzi and Kevin Roundy. 2022. Position:"Real Attackers Don't Compute Gradients": Bridging the Gap Between Adversarial ML Research and Practice. In SaTML. Giovanni Apruzzese Hyrum Anderson Savino Dambra David Freeman Fabio Pierazzi and Kevin Roundy. 2022. Position:"Real Attackers Don't Compute Gradients": Bridging the Gap Between Adversarial ML Research and Practice. In SaTML.

4. Yuntao Bai Andy Jones Kamal Ndousse Amanda Askell Anna Chen Nova DasSarma Dawn Drain Stanislav Fort Deep Ganguli Tom Henighan etal 2022. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv (2022). Yuntao Bai Andy Jones Kamal Ndousse Amanda Askell Anna Chen Nova DasSarma Dawn Drain Stanislav Fort Deep Ganguli Tom Henighan et al. 2022. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv (2022).

5. Nora Belrose , Zach Furman , Logan Smith , Danny Halawi , Igor Ostrovsky , Lev McKinney , Stella Biderman , and Jacob Steinhardt . 2023. Eliciting Latent Predictions from Transformers with the Tuned Lens. arXiv ( 2023 ). Nora Belrose, Zach Furman, Logan Smith, Danny Halawi, Igor Ostrovsky, Lev McKinney, Stella Biderman, and Jacob Steinhardt. 2023. Eliciting Latent Predictions from Transformers with the Tuned Lens. arXiv (2023).

Cited by 26 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. FDI: Attack Neural Code Generation Systems through User Feedback Channel;Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis;2024-09-11

2. Easy-read and large language models: on the ethical dimensions of LLM-based text simplification;Ethics and Information Technology;2024-08-04

3. The Inadequacy of Reinforcement Learning From Human Feedback—Radicalizing Large Language Models via Semantic Vulnerabilities;IEEE Transactions on Cognitive and Developmental Systems;2024-08

4. Do large language models have a legal duty to tell the truth?;Royal Society Open Science;2024-08

5. Forensic Analysis of Artifacts from Microsoft's Multi-Agent LLM Platform AutoGen;Proceedings of the 19th International Conference on Availability, Reliability and Security;2024-07-30