Vignette-based comparative analysis of ChatGPT and specialist treatment decisions for rheumatic patients: results of the Rheum2Guide study
-
Published:2024-08-10
Issue:10
Volume:44
Page:2043-2053
-
ISSN:1437-160X
-
Container-title:Rheumatology International
-
language:en
-
Short-container-title:Rheumatol Int
Author:
Labinsky HannahORCID, Nagler Lea-KristinORCID, Krusche MartinORCID, Griewing SebastianORCID, Aries PeerORCID, Kroiß AnjaORCID, Strunz Patrick-PascalORCID, Kuhn SebastianORCID, Schmalzing MarcORCID, Gernert MichaelORCID, Knitza JohannesORCID
Abstract
Abstract
Background
The complex nature of rheumatic diseases poses considerable challenges for clinicians when developing individualized treatment plans. Large language models (LLMs) such as ChatGPT could enable treatment decision support.
Objective
To compare treatment plans generated by ChatGPT-3.5 and GPT-4 to those of a clinical rheumatology board (RB).
Design/methods
Fictional patient vignettes were created and GPT-3.5, GPT-4, and the RB were queried to provide respective first- and second-line treatment plans with underlying justifications. Four rheumatologists from different centers, blinded to the origin of treatment plans, selected the overall preferred treatment concept and assessed treatment plans’ safety, EULAR guideline adherence, medical adequacy, overall quality, justification of the treatment plans and their completeness as well as patient vignette difficulty using a 5-point Likert scale.
Results
20 fictional vignettes covering various rheumatic diseases and varying difficulty levels were assembled and a total of 160 ratings were assessed. In 68.8% (110/160) of cases, raters preferred the RB’s treatment plans over those generated by GPT-4 (16.3%; 26/160) and GPT-3.5 (15.0%; 24/160). GPT-4’s plans were chosen more frequently for first-line treatments compared to GPT-3.5. No significant safety differences were observed between RB and GPT-4’s first-line treatment plans. Rheumatologists’ plans received significantly higher ratings in guideline adherence, medical appropriateness, completeness and overall quality. Ratings did not correlate with the vignette difficulty. LLM-generated plans were notably longer and more detailed.
Conclusion
GPT-4 and GPT-3.5 generated safe, high-quality treatment plans for rheumatic diseases, demonstrating promise in clinical decision support. Future research should investigate detailed standardized prompts and the impact of LLM usage on clinical decisions.
Funder
GlaxoSmithKline Biologicals Philipps-Universität Marburg
Publisher
Springer Science and Business Media LLC
Reference31 articles.
1. Smolen JS, Landewe RBM, Bergstra SA, Kerschbaumer A, Sepriano A, Aletaha D, Caporali R, Edwards CJ, Hyrich KL, Pope JE, de Souza S, Stamm TA, Takeuchi T, Verschueren P, Winthrop KL, Balsa A, Bathon JM, Buch MH, Burmester GR, Buttgereit F, Cardiel MH, Chatzidionysiou K, Codreanu C, Cutolo M, den Broeder AA, El Aoufy K, Finckh A, Fonseca JE, Gottenberg JE, Haavardsholm EA, Iagnocco A, Lauper K, Li Z, McInnes IB, Mysler EF, Nash P, Poor G, Ristic GG, Rivellese F, Rubbert-Roth A, Schulze-Koops H, Stoilov N, Strangfeld A, van der Helm-van MA, van Duuren E, Vliet Vlieland TPM, Westhovens R, van der Heijde D (2023) EULAR recommendations for the management of rheumatoid arthritis with synthetic and biological disease-modifying antirheumatic drugs: 2022 update. Ann Rheum Dis 82(1):3–18. https://doi.org/10.1136/ard-2022-223356 2. Labinsky H, Ukalovic D, Hartmann F, Runft V, Wichmann A, Jakubcik J, Gambel K, Otani K, Morf H, Taubmann J, Fagni F, Kleyer A, Simon D, Schett G, Reichert M, Knitza J (2023) An AI-powered clinical decision support system to predict flares in rheumatoid arthritis: a pilot study. Diagnostics (Basel). https://doi.org/10.3390/diagnostics13010148 3. Griewing S, Knitza J, Boekhoff J, Hillen C, Lechner F, Wagner U, Wallwiener M, Kuhn S (2024) Evolution of publicly available large language models for complex decision-making in breast cancer care. Arch Gynecol Obstet 310(1):537–550. https://doi.org/10.1007/s00404-024-07565-4 4. Hugle T (2023) The wide range of opportunities for large language models such as ChatGPT in rheumatology. RMD Open. https://doi.org/10.1136/rmdopen-2023-003105 5. Madrid-Garcia A, Rosales-Rosado Z, Freites-Nunez D, Perez-Sancristobal I, Pato-Cour E, Plasencia-Rodriguez C, Cabeza-Osorio L, Abasolo-Alcazar L, Leon-Mateos L, Fernandez-Gutierrez B, Rodriguez-Rodriguez L (2023) Harnessing ChatGPT and GPT-4 for evaluating the rheumatology questions of the Spanish access exam to specialized medical training. Sci Rep 13(1):22129. https://doi.org/10.1038/s41598-023-49483-6
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|