Prompt engineering on leveraging large language models in generating response to InBasket messages

Author:

Yan Sherry12,Knapp Wendi23,Leong Andrew24,Kadkhodazadeh Sarira23,Das Souvik25,Jones Veena G3,Clark Robert3,Grattendick David6,Chen Kevin2,Hladik Lisa2,Fagan Lawrence27,Chan Albert2

Affiliation:

1. Center for Health Systems Research, Sutter Health , Walnut Creek, CA 94596, United States

2. Sutter Health Data Science Team, Sutter Health , Sacramento, CA 95833, United States

3. Palo Alto Foundation Medical Group, Sutter Health , Palo Alto, CA 94301, United States

4. Sutter Digital Engagement, Sutter Health , Sacramento, CA 95833, United States

5. Sutter Data and Analytics Department, Sutter Health , Sacramento, CA 95833, United States

6. Sutter Medical Group, Sutter Health , Roseville, CA 95661, United States

7. Patient Family Advisory Council, Sutter Health , Palo Alto, CA 94301, United States

Abstract

Abstract Objectives Large Language Models (LLMs) have been proposed as a solution to address high volumes of Patient Medical Advice Requests (PMARs). This study addresses whether LLMs can generate high quality draft responses to PMARs that satisfies both patients and clinicians with prompt engineering. Materials and Methods We designed a novel human-involved iterative processes to train and validate prompts to LLM in creating appropriate responses to PMARs. GPT-4 was used to generate response to the messages. We updated the prompts, and evaluated both clinician and patient acceptance of LLM-generated draft responses at each iteration, and tested the optimized prompt on independent validation data sets. The optimized prompt was implemented in the electronic health record production environment and tested by 69 primary care clinicians. Results After 3 iterations of prompt engineering, physician acceptance of draft suitability increased from 62% to 84% (P <.001) in the validation dataset (N = 200), and 74% of drafts in the test dataset were rated as “helpful.” Patients also noted significantly increased favorability of message tone (78%) and overall quality (80%) for the optimized prompt compared to the original prompt in the training dataset, patients were unable to differentiate human and LLM-generated draft PMAR responses for 76% of the messages, in contrast to the earlier preference for human-generated responses. Majority (72%) of clinicians believed it can reduce cognitive load in dealing with InBasket messages. Discussion and Conclusion Informed by clinician and patient feedback synergistically, tuning in LLM prompt alone can be effective in creating clinically relevant and useful draft responses to PMARs.

Publisher

Oxford University Press (OUP)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3