Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study (Preprint)-Reference-Cited by-同舟云学术

Educational Utility of Clinical Vignettes Generated in Japanese by ChatGPT-4: Mixed Methods Study (Preprint)

Published:2024-04-04 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Takahashi Hiromizu^ORCID,Shikino Kiyoshi^ORCID,Kondo Takeshi^ORCID,Komori Akira^ORCID,Yamada Yuji^ORCID,Saita Mizue^ORCID,Naito Toshio^ORCID

Abstract

BACKGROUND

Evaluating the accuracy and educational utility of artificial intelligence–generated medical cases, especially those produced by large language models such as ChatGPT-4 (developed by OpenAI), is crucial yet underexplored.

OBJECTIVE

This study aimed to assess the educational utility of ChatGPT-4–generated clinical vignettes and their applicability in educational settings.

METHODS

Using a convergent mixed methods design, a web-based survey was conducted from January 8 to 28, 2024, to evaluate 18 medical cases generated by ChatGPT-4 in Japanese. In the survey, 6 main question items were used to evaluate the quality of the generated clinical vignettes and their educational utility, which are information quality, information accuracy, educational usefulness, clinical match, terminology accuracy (TA), and diagnosis difficulty. Feedback was solicited from physicians specializing in general internal medicine or general medicine and experienced in medical education. Chi-square and Mann-Whitney <i>U</i> tests were performed to identify differences among cases, and linear regression was used to examine trends associated with physicians’ experience. Thematic analysis of qualitative feedback was performed to identify areas for improvement and confirm the educational utility of the cases.

RESULTS

Of the 73 invited participants, 71 (97%) responded. The respondents, primarily male (64/71, 90%), spanned a broad range of practice years (from 1976 to 2017) and represented diverse hospital sizes throughout Japan. The majority deemed the information quality (mean 0.77, 95% CI 0.75-0.79) and information accuracy (mean 0.68, 95% CI 0.65-0.71) to be satisfactory, with these responses being based on binary data. The average scores assigned were 3.55 (95% CI 3.49-3.60) for educational usefulness, 3.70 (95% CI 3.65-3.75) for clinical match, 3.49 (95% CI 3.44-3.55) for TA, and 2.34 (95% CI 2.28-2.40) for diagnosis difficulty, based on a 5-point Likert scale. Statistical analysis showed significant variability in content quality and relevance across the cases (<i>P</i><.001 after Bonferroni correction). Participants suggested improvements in generating physical findings, using natural language, and enhancing medical TA. The thematic analysis highlighted the need for clearer documentation, clinical information consistency, content relevance, and patient-centered case presentations.

CONCLUSIONS

ChatGPT-4–generated medical cases written in Japanese possess considerable potential as resources in medical education, with recognized adequacy in quality and accuracy. Nevertheless, there is a notable need for enhancements in the precision and realism of case details. This study emphasizes ChatGPT-4’s value as an adjunctive educational tool in the medical field, requiring expert oversight for optimal application.

Publisher

JMIR Publications Inc.

Reference22 articles.

1. Will ChatGPT transform healthcare?

2. Impact of ChatGPT on medical chatbots as a disruptive technology

3. The Intersection of ChatGPT, Clinical Medicine, and Medical Education

4. Performance Comparison of ChatGPT-4 and Japanese Medical Residents in the General Medicine In-Training Examination: Comparison Study

5. Developing Medical Education Curriculum Reform Strategies to Address the Impact of Generative AI: Qualitative Study