Evaluating the Effectiveness and Safety of Large Language Model in Generating Type 2 Diabetes Mellitus Management Plans: A Comparative Study with Medical Experts Based on Real Patient Records

Author:

Mondal AgnibhoORCID,Naskar ArindamORCID,Roy Choudhury Bhaskar,Chakraborty Sambudhya,Biswas Tanmay,Sinha Sumanta

Abstract

AbstractBackgroundThe integration of large language models (LLMs) such as GPT-4 into healthcare presents potential benefits and challenges. While LLMs have shown promise in applications ranging from scientific writing to personalized medicine, their practical utility and safety in clinical settings remain under scrutiny. Concerns about accuracy, ethical considerations and bias necessitate rigorous evaluation of these technologies against established medical standards.ObjectiveTo compare the completeness, necessity, dosage accuracy and overall safety of type 2 diabetes management plans created by GPT-4 with those devised by medical experts.MethodsThis study involved a comparative analysis using anonymized patient records from a healthcare setting in West Bengal, India. Management plans for 50 Type 2 diabetes patients were generated by GPT-4 and three blinded medical experts. These plans were evaluated against a reference management plan based on American Diabetes Society guidelines. Completeness, necessity and dosage accuracy were quantified and an error score was devised to assess the quality of the generated management plans. The safety of the management plans generated by GPT-4 was also assessed.ResultsResults indicated that medical experts’ management plans had fewer missing medications compared to those generated by GPT-4 (p=0.008). However, GPT-4 generated management plans included fewer unnecessary medications (p=0.003). No significant difference was observed in the accuracy of drug dosages (p=0.975). The overall error scores were comparable between human experts and GPT-4 (p=0.301). Safety issues were noted in 16% of the plans generated by GPT-4, highlighting potential risks associated with AI-generated management plans.ConclusionThe study demonstrates that while GPT-4 can effectively reduce unnecessary drug prescriptions, it does not yet match the performance of medical experts in terms of plan completeness and safety. The findings support the use of LLMs as supplementary tools in healthcare, underscoring the need for enhanced algorithms and continuous human oversight to ensure the efficacy and safety of AI applications in clinical settings. Further research is necessary to improve the integration of LLMs into complex healthcare environments.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3