Abstract
AbstractCode suggestions from generative language models like ChatGPT contain vulnerabilities as they often rely on older code and programming practices, over-represented in the older code libraries the LLMs rely on for their coding abilities. Advanced attackers can leverage this by injecting code with known but hard-to-detect vulnerabilities in the training datasets. Mitigation can include user education and engineered safeguards such as LLMs trained for vulnerability detection or rule-based checking of codebases. Analysis of LLMs’ code generation capabilities, including formal verification and source training dataset (code-comment pairs) analysis, is necessary for effective vulnerability detection and mitigation.
Publisher
Springer Nature Switzerland
Reference60 articles.
1. Gustavo Sandoval et al. Lost at c: A user study on the security implications of large language model code assistants, 2023.
2. Alec Radford et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
3. Toufique Ahmed and Premkumar T. Devanbu. Few-shot training llms for project-specific code-summarization. In 37th IEEE/ACM International Conference on Automated Software Engineering, ASE 2022, Rochester, MI, USA, October 10–14, 2022, pages 177:1–177:5. ACM, 2022.
4. Sameera Horawalavithana et al. Mentions of security vulnerabilities on reddit, twitter and github. In Payam M. Barnaghi, Georg Gottlob, Yannis Manolopoulos, Theodoros Tzouramanis, and Athena Vakali, editors, WI, pages 200–207. ACM, 2019.
5. David Glukhov et al. Llm censorship: A machine learning challenge or a computer security problem?, 2023.