Large Language Model Capabilities in Perioperative Risk Prediction and Prognostication-Reference-Cited by-同舟云学术

Large Language Model Capabilities in Perioperative Risk Prediction and Prognostication

Published:2024-08-01 Issue:8 Volume:159 Page:928
ISSN:2168-6254
Container-title:JAMA Surgery
language:en
Short-container-title:JAMA Surg

Author:

Chung Philip¹,Fong Christine T.²,Walters Andrew M.²,Aghaeepour Nima¹,Yetisgen Meliha³⁴,O’Reilly-Shah Vikas N.²

Affiliation:

1. Department of Anesthesiology, Perioperative & Pain Medicine, Stanford University, Stanford, California

2. Department of Anesthesiology & Pain Medicine, University of Washington, Seattle

3. Department of Biomedical & Health Informatics, University of Washington, Seattle

4. Department of Linguistics, University of Washington, Seattle

Abstract

ImportanceGeneral-domain large language models may be able to perform risk stratification and predict postoperative outcome measures using a description of the procedure and a patient’s electronic health record notes.ObjectiveTo examine predictive performance on 8 different tasks: prediction of American Society of Anesthesiologists Physical Status (ASA-PS), hospital admission, intensive care unit (ICU) admission, unplanned admission, hospital mortality, postanesthesia care unit (PACU) phase 1 duration, hospital duration, and ICU duration.Design, Setting, and ParticipantsThis prognostic study included task-specific datasets constructed from 2 years of retrospective electronic health records data collected during routine clinical care. Case and note data were formatted into prompts and given to the large language model GPT-4 Turbo (OpenAI) to generate a prediction and explanation. The setting included a quaternary care center comprising 3 academic hospitals and affiliated clinics in a single metropolitan area. Patients who had a surgery or procedure with anesthesia and at least 1 clinician-written note filed in the electronic health record before surgery were included in the study. Data were analyzed from November to December 2023.ExposuresCompared original notes, note summaries, few-shot prompting, and chain-of-thought prompting strategies.Main Outcomes and MeasuresF1 score for binary and categorical outcomes. Mean absolute error for numerical duration outcomes.ResultsStudy results were measured on task-specific datasets, each with 1000 cases with the exception of unplanned admission, which had 949 cases, and hospital mortality, which had 576 cases. The best results for each task included an F1 score of 0.50 (95% CI, 0.47-0.53) for ASA-PS, 0.64 (95% CI, 0.61-0.67) for hospital admission, 0.81 (95% CI, 0.78-0.83) for ICU admission, 0.61 (95% CI, 0.58-0.64) for unplanned admission, and 0.86 (95% CI, 0.83-0.89) for hospital mortality prediction. Performance on duration prediction tasks was universally poor across all prompt strategies for which the large language model achieved a mean absolute error of 49 minutes (95% CI, 46-51 minutes) for PACU phase 1 duration, 4.5 days (95% CI, 4.2-5.0 days) for hospital duration, and 1.1 days (95% CI, 0.9-1.3 days) for ICU duration prediction.Conclusions and RelevanceCurrent general-domain large language models may assist clinicians in perioperative risk stratification on classification tasks but are inadequate for numerical duration predictions. Their ability to produce high-quality natural language explanations for the predictions may make them useful tools in clinical workflows and may be complementary to traditional risk prediction models.

Publisher

American Medical Association (AMA)

Link

https://jamanetwork.com/journals/jamasurgery/articlepdf/2819795/jamasurgery_chung_2024_oi_240033_1723061355.5978.pdf

Reference62 articles.

1. Large language models encode clinical knowledge.;Singhal;Nature,2023

2. Extracting medication changes in clinical narratives using pre-trained language models.;Ramachandran;J Biomed Inform,2023

3. Adapted large language models can outperform medical experts in clinical text summarization.;Van Veen;Nat Med,2024

4. Almanac—retrieval-augmented language models for clinical medicine.;Zakka;NEJM AI,2024

5. Grading of patients for surgical procedures.;Saklad;Anesthesiology,1941

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Supercharge Your Academic Productivity with Generative Artificial Intelligence;Journal of Medical Systems;2024-08-08

2. Travel Guide From the Brave New World of Artificial Intelligence;JAMA Surgery;2024-08-01