Can GPT-3.5 generate and code discharge summaries?

Author:

Falis Matúš1ORCID,Gema Aryo Pradipta1ORCID,Dong Hang2ORCID,Daines Luke3ORCID,Basetti Siddharth4ORCID,Holder Michael5ORCID,Penfold Rose S67ORCID,Birch Alexandra1ORCID,Alex Beatrice89ORCID

Affiliation:

1. School of Informatics, The University of Edinburgh , Edinburgh EH8 9AB, United Kingdom

2. Department of Computer Science, University of Exeter , Exeter EX4 4QF, United Kingdom

3. Centre for Medical Informatics, Usher Institute, University of Edinburgh , Edinburgh EH16 4UX, United Kingdom

4. Department of Research, Development and Innovation, National Health Service Highland , Inverness IV2 3JH, United Kingdom

5. Centre for Population Health Sciences, Usher Institute, The University of Edinburgh , Edinburgh EH16 4UX, United Kingdom

6. Ageing and Health, Usher Institute, The University of Edinburgh , Edinburgh EH16 4UX, United Kingdom

7. Advanced Care Research Centre, The University of Edinburgh , Edinburgh EH16 4UX, United Kingdom

8. Edinburgh Futures Institute, The University of Edinburgh , Edinburgh EH3 9EF, United Kingdom

9. School of Literatures, Languages and Cultures, The University of Edinburgh , Edinburgh EH8 9LH, United Kingdom

Abstract

Abstract Objectives The aim of this study was to investigate GPT-3.5 in generating and coding medical documents with International Classification of Diseases (ICD)-10 codes for data augmentation on low-resource labels. Materials and Methods Employing GPT-3.5 we generated and coded 9606 discharge summaries based on lists of ICD-10 code descriptions of patients with infrequent (or generation) codes within the MIMIC-IV dataset. Combined with the baseline training set, this formed an augmented training set. Neural coding models were trained on baseline and augmented data and evaluated on an MIMIC-IV test set. We report micro- and macro-F1 scores on the full codeset, generation codes, and their families. Weak Hierarchical Confusion Matrices determined within-family and outside-of-family coding errors in the latter codesets. The coding performance of GPT-3.5 was evaluated on prompt-guided self-generated data and real MIMIC-IV data. Clinicians evaluated the clinical acceptability of the generated documents. Results Data augmentation results in slightly lower overall model performance but improves performance for the generation candidate codes and their families, including 1 absent from the baseline training data. Augmented models display lower out-of-family error rates. GPT-3.5 identifies ICD-10 codes by their prompted descriptions but underperforms on real data. Evaluators highlight the correctness of generated concepts while suffering in variety, supporting information, and narrative. Discussion and Conclusion While GPT-3.5 alone given our prompt setting is unsuitable for ICD-10 coding, it supports data augmentation for training neural models. Augmentation positively affects generation code families but mainly benefits codes with existing examples. Augmentation reduces out-of-family errors. Documents generated by GPT-3.5 state prompted concepts correctly but lack variety, and authenticity in narratives.

Funder

United Kingdom Research and Innovation

UKRI

Engineering and Physical Sciences Research Council

Wellcome Trust

Legal and General PLC

National Institute for Health Research

Publisher

Oxford University Press (OUP)

Reference34 articles.

1. Automated clinical coding: what, why, and where we are?;Dong;NPJ Digit Med,2022

2. MIMIC-IV, a freely accessible electronic health record dataset;Johnson;Sci Data,2023

3. Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation;Dong,2021

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Large language models in biomedicine and health: current research landscape and future directions;Journal of the American Medical Informatics Association;2024-08-22

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3