Protocol For Human Evaluation of Artificial Intelligence Chatbots in Clinical Consultations-Reference-Cited by-同舟云学术

Protocol For Human Evaluation of Artificial Intelligence Chatbots in Clinical Consultations

Published:2024-03-02 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Chiu Edwin Kwan-Yeung^ORCID,Chung Tom Wai-Hin^ORCID

Abstract

AbstractBackgroundGenerative artificial intelligence (AI) technology has the revolutionary potentials to augment clinical practice and telemedicine. The nuances of real-life patient scenarios and complex clinical environments demand a rigorous, evidence-based approach to ensure safe and effective application.MethodsWe present a protocol for the systematic evaluation of generative AI large language models (LLMs) as chatbots within the context of clinical microbiology and infectious disease consultations. We aim to critically assess the clinical accuracy, comprehensiveness, coherence, and safety of recommendations produced by leading generative AI models, including Claude 2, Gemini Pro, GPT-4.0, and a GPT-4.0-based custom AI chatbot.DiscussionA standardised healthcare-specific prompt template is employed to elicit clinically impactful AI responses. Generated responses will be graded by a panel of human evaluators, encompassing a wide spectrum of domain expertise in clinical microbiology and virology and clinical infectious diseases. Evaluations are performed using a 5-point Likert scale across four clinical domains: factual consistency, comprehensiveness, coherence, and medical harmfulness. Our study will offer insights into the feasibility, limitations, and boundaries of generative AI in healthcare, providing guidance for future research and clinical implementation. Ethical guidelines and safety guardrails should be developed to uphold patient safety and clinical standards.

Publisher

Cold Spring Harbor Laboratory

Reference22 articles.

1. Human-like problem-solving abilities in large language models using ChatGPT;Frontiers in Artificial Intelligence,2023

2. ChatGPT and antimicrobial advice: the end of the consulting infection doctor?

3. Dyckhoff-Shen S , Koedel U , Brouwer MC , Bodilsen J , Klein M. ChatGPT fails challenging the recent ESCMID brain abscess guideline. Journal of Neurology. 2024:1–16.

4. Schwartz IS , Link KE , Daneshjou R , Cortés-Penfield N. Black box warning: large language models and the future of infectious diseases consultation. Clinical Infectious Diseases. 2023:ciad633.

5. Maillard A , Micheli G , Lefevre L , Guyonnet C , Poyart C , Canouï E , et al. Can Chatbot Artificial Intelligence Replace Infectious Diseases Physicians in the Management of Bloodstream Infections? A Prospective Cohort Study. Clinical Infectious Diseases. 2023:ciad632.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Generative artificial intelligence models in clinical infectious disease consultations: a cross-sectional analysis among specialists and resident trainees;2024-08-19

2. Exploring the Role of Generative AI in Medical Microbiology Education: Enhancing Bacterial Identification Skills in Laboratory Students;Communications in Computer and Information Science;2024