A Generative Pretrained Transformer (GPT)–Powered Chatbot as a Simulated Patient to Practice History Taking: Prospective, Mixed Methods Study (Preprint)-Reference-Cited by-同舟云学术

A Generative Pretrained Transformer (GPT)–Powered Chatbot as a Simulated Patient to Practice History Taking: Prospective, Mixed Methods Study (Preprint)

Published:2023-10-25 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Holderried Friederike^ORCID,Stegemann–Philipps Christian^ORCID,Herschbach Lea^ORCID,Moldt Julia-Astrid^ORCID,Nevins Andrew^ORCID,Griewatz Jan^ORCID,Holderried Martin^ORCID,Herrmann-Werner Anne^ORCID,Festl-Wietek Teresa^ORCID,Mahling Moritz^ORCID

Abstract

BACKGROUND

Communication is a core competency of medical professionals and of utmost importance for patient safety. Although medical curricula emphasize communication training, traditional formats, such as real or simulated patient interactions, can present psychological stress and are limited in repetition. The recent emergence of large language models (LLMs), such as generative pretrained transformer (GPT), offers an opportunity to overcome these restrictions

OBJECTIVE

The aim of this study was to explore the feasibility of a GPT-driven chatbot to practice history taking, one of the core competencies of communication.

METHODS

We developed an interactive chatbot interface using GPT-3.5 and a specific prompt including a chatbot-optimized illness script and a behavioral component. Following a mixed methods approach, we invited medical students to voluntarily practice history taking. To determine whether GPT provides suitable answers as a simulated patient, the conversations were recorded and analyzed using quantitative and qualitative approaches. We analyzed the extent to which the questions and answers aligned with the provided script, as well as the medical plausibility of the answers. Finally, the students filled out the Chatbot Usability Questionnaire (CUQ).

RESULTS

A total of 28 students practiced with our chatbot (mean age 23.4, SD 2.9 years). We recorded a total of 826 question-answer pairs (QAPs), with a median of 27.5 QAPs per conversation and 94.7% (n=782) pertaining to history taking. When questions were explicitly covered by the script (n=502, 60.3%), the GPT-provided answers were mostly based on explicit script information (n=471, 94.4%). For questions not covered by the script (n=195, 23.4%), the GPT answers used 56.4% (n=110) fictitious information. Regarding plausibility, 842 (97.9%) of 860 QAPs were rated as plausible. Of the 14 (2.1%) implausible answers, GPT provided answers rated as socially desirable, leaving role identity, ignoring script information, illogical reasoning, and calculation error. Despite these results, the CUQ revealed an overall positive user experience (77/100 points).

CONCLUSIONS

Our data showed that LLMs, such as GPT, can provide a simulated patient experience and yield a good user experience and a majority of plausible answers. Our analysis revealed that GPT-provided answers use either explicit script information or are based on available information, which can be understood as abductive reasoning. Although rare, the GPT-based chatbot provides implausible information in some instances, with the major tendency being socially desirable instead of medically plausible information.

Publisher

JMIR Publications Inc.

Reference41 articles.

1. How clinical communication has become a core part of medical education in the UK

2. A systematic review of healthcare professionals' core competency instruments

3. Endpoints in medical communication research, proposing a framework of functions and outcomes

4. How does communication heal? Pathways linking clinician–patient communication to health outcomes

5. Enhancing medical students' communication skills: development and evaluation of an undergraduate training program