Author:
Most Amoreena,Chase Aaron,Sikora Andrea
Abstract
AbstractBackgroundLarge language models (LLMs) such as ChatGPT have emerged as promising artificial intelligence tools to support clinical decision making. The ability of ChatGPT to evaluate medication regimens, identify drug-drug interactions (DDIs), and provide clinical recommendations is unknown. The purpose of this study is to examine the performance of GPT-4 to identify clinically relevant DDIs and assess accuracy of recommendations provided.MethodsA total of 15 medication regimens were created containing commonly encountered DDIs that were considered either clinically significant or clinically unimportant. Two separate prompts were developed for medication regimen evaluation. The primary outcome was if GPT-4 identified the most relevant DDI within the medication regimen. Secondary outcomes included rating GPT-4’s interaction rationale, clinical relevance ranking, and overall clinical recommendations. Interrater reliability was determined using kappa statistic.ResultsGPT-4 identified the intended DDI in 90% of medication regimens provided (27/30). GPT-4 categorized 86% as highly clinically relevant compared to 53% being categorized as highly clinically relevant by expert opinion. Inappropriate clinical recommendations potentially causing patient harm were provided in 14% of responses provided by GPT-4 (2/14), and 63% of responses contained accurate information but incomplete recommendations (19/30).ConclusionsWhile GPT-4 demonstrated promise in its ability to identify clinically relevant DDIs, application to clinical cases remains an area of investigation. Findings from this study may assist in future development and refinement of LLMs for drug-drug interaction queries to assist in clinical decision-making.
Publisher
Cold Spring Harbor Laboratory