Misuse of large language models: Exploiting weaknesses for target-specific outputs-Reference-Cited by-同舟云学术

Misuse of large language models: Exploiting weaknesses for target-specific outputs

Published:2024-06-28 Issue:2 Volume:33 Page:29-34
ISSN:2567-8833
Container-title:TATuP - Zeitschrift für Technikfolgenabschätzung in Theorie und Praxis
language:
Short-container-title:TATuP

Author:

Klinkhammer Dennis^ORCID

Abstract

Prompt engineering in large language models (LLMs) in combination with external context can be misused for jailbreaks in order to generate malicious outputs. In the process, jailbreak prompts are apparently amplified in such a way that LLMs can generate malicious outputs on a large scale despite their initial training. As social bots, these can contribute to the dissemination of misinformation, hate speech, and discriminatory content. Using GPT4-x-Vicuna-13b-4bit from NousResearch, we demonstrate in this article the effectiveness of jailbreak prompts and external contexts via Jupyter Notebook based on the Python programming language. In addition, we highlight the methodological foundations of prompt engineering and its potential to create malicious content in order to sensitize researchers, practitioners, and policymakers to the importance of responsible development and deployment of LLMs.

Publisher

Oekom Publishers GmbH

Reference31 articles.

1. Agrawal, Sweta; Zhou, Chunting; Lewis, Mike; Zettlemoyer, Luke; Ghazvininejad, Marjan (2023): In-context examples selection for machine translation. In: arxiv.org, 05. 12. 2022. https://doi.org/10.48550/arXiv.2212.02437

2. Arora, Simran et al. (2023): Ask me anything. A simple strategy for prompting language models. In: arxiv.org, 05. 10. 2022. https://doi.org/10.48550/arXiv.2210.02441

3. Ba, Jimmy; Kiros, Jamie; Hinton, Geoffrey (2016): Layer normalization. In: arxiv.org, 21. 06. 2016. https://doi.org/10.48550/arXiv.1607.06450

4. Birhane, Adeba; Kasirzadeh, Atoosa; Leslie, David; Wachter, Sandra (2023): Science in the age of large language models. In: Nature Reviews Physics 5 (5), pp. 277–280. https://doi.org/10.1038/s42254-023-00581-4

5. Chen, Canyu; Shu, Kai (2023): Can LLM-generated misinformation be detected? In: arxiv.org, 25. 09. 2023. https://doi.org/10.48550/arXiv.2309.13788