Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis-Reference-Cited by-同舟云学术

Current safeguards, risk mitigation, and transparency measures of large language models against the generation of health disinformation: repeated cross sectional analysis

Published:2024-03-20 Issue: Volume: Page:e078538
ISSN:1756-1833
Container-title:BMJ
language:en
Short-container-title:BMJ

Author:

Menz Bradley D,Kuderer Nicole M,Bacchi Stephen,Modi Natansh D,Chin-Yee Benjamin,Hu Tiancheng,Rickard Ceara,Haseloff Mark,Vitry Agnes,McKinnon Ross A,Kichenadasse Ganessan,Rowland Andrew,Sorich Michael J,Hopkins Ashley M^ORCID

Abstract

Abstract Objectives To evaluate the effectiveness of safeguards to prevent large language models (LLMs) from being misused to generate health disinformation, and to evaluate the transparency of artificial intelligence (AI) developers regarding their risk mitigation processes against observed vulnerabilities. Design Repeated cross sectional analysis. Setting Publicly accessible LLMs. Methods In a repeated cross sectional analysis, four LLMs (via chatbots/assistant interfaces) were evaluated: OpenAI’s GPT-4 (via ChatGPT and Microsoft’s Copilot), Google’s PaLM 2 and newly released Gemini Pro (via Bard), Anthropic’s Claude 2 (via Poe), and Meta’s Llama 2 (via HuggingChat). In September 2023, these LLMs were prompted to generate health disinformation on two topics: sunscreen as a cause of skin cancer and the alkaline diet as a cancer cure. Jailbreaking techniques (ie, attempts to bypass safeguards) were evaluated if required. For LLMs with observed safeguarding vulnerabilities, the processes for reporting outputs of concern were audited. 12 weeks after initial investigations, the disinformation generation capabilities of the LLMs were re-evaluated to assess any subsequent improvements in safeguards. Main outcome measures The main outcome measures were whether safeguards prevented the generation of health disinformation, and the transparency of risk mitigation processes against health disinformation. Results Claude 2 (via Poe) declined 130 prompts submitted across the two study timepoints requesting the generation of content claiming that sunscreen causes skin cancer or that the alkaline diet is a cure for cancer, even with jailbreaking attempts. GPT-4 (via Copilot) initially refused to generate health disinformation, even with jailbreaking attempts—although this was not the case at 12 weeks. In contrast, GPT-4 (via ChatGPT), PaLM 2/Gemini Pro (via Bard), and Llama 2 (via HuggingChat) consistently generated health disinformation blogs. In September 2023 evaluations, these LLMs facilitated the generation of 113 unique cancer disinformation blogs, totalling more than 40 000 words, without requiring jailbreaking attempts. The refusal rate across the evaluation timepoints for these LLMs was only 5% (7 of 150), and as prompted the LLM generated blogs incorporated attention grabbing titles, authentic looking (fake or fictional) references, fabricated testimonials from patients and clinicians, and they targeted diverse demographic groups. Although each LLM evaluated had mechanisms to report observed outputs of concern, the developers did not respond when observations of vulnerabilities were reported. Conclusions This study found that although effective safeguards are feasible to prevent LLMs from being misused to generate health disinformation, they were inconsistently implemented. Furthermore, effective processes for reporting safeguard problems were lacking. Enhanced regulation, transparency, and routine auditing are required to help prevent LLMs from contributing to the mass generation of health disinformation.

Funder

Cancer Council Australia

National Health and Medical Research Council

Publisher

BMJ

Reference30 articles.

1. AI-Generated Medical Advice—GPT and Beyond

2. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine

3. Artificial intelligence chatbots will revolutionize how cancer patients access information: ChatGPT represents a paradigm-shift

4. ChatGPT: the future of discharge summaries?

5. How to achieve trustworthy artificial intelligence for health

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Evaluating multimodal AI in medical diagnostics;npj Digital Medicine;2024-08-07

2. Impact of Large Language Models on Medical Education and Teaching Adaptations;JMIR Medical Informatics;2024-07-25

3. Artificial Intelligence‐Generated Patient Education Materials for Helicobacter pylori Infection: A Comparative Analysis;Helicobacter;2024-07

4. 医学数字人GPT的研究现状及展望;Metaverse in Medicine;2024-03-28

5. Generative artificial intelligence and medical disinformation;BMJ;2024-03-20