Abstract
AbstractObjectivesTo explore the potential of Large Language Models (LLMs) to extract and structure information from free-text clinical reports, with a specific focus on identifying and classifying patient comorbidities in the electronic health records of oncology. We specifically evaluate the gpt-3.5-turbo-1106 and gpt-4-1106-preview models in comparison with the capabilities of specialized human evaluators.MethodsWe implemented a script using the OpenAI API to extract structured information in JSON format from comorbidities reported in 250 personal history reports. These reports were manually reviewed in batches of 50 by five specialists in radiation oncology. We compared the results using metrics such as Sensitivity, Specificity, Precision, Accuracy, F-value, Kappa index, and the McNemar test, in addition to examining the common causes of errors in both humans and GPT models.ResultsThe GPT-3.5 model exhibited slightly lower performance compared to physicians across all metrics, though the differences were not statistically significant. GPT-4 demonstrated clear superiority in several key metrics. Notably, it achieved a sensitivity of 96.8%, compared to 88.2% for GPT-3.5 and 88.8% for physicians. However, physicians marginally outperformed GPT-4 in precision (97.7% vs. 96.8%). GPT-4 showed greater consistency, replicating exact results in 76% of the reports after 10 analyses, in contrast to 59% for GPT-3.5. Physicians were more likely to miss explicit comorbidities, while the GPT models more frequently inferred non-explicit comorbidities, sometimes correctly, though this also resulted in more false positives.ConclusionThe studied LLMs, with carefully designed prompts, demonstrate competence comparable to that of medical specialists in interpreting clinical reports, even in complex and confusingly written texts. Considering also their superior efficiency in terms of time and costs, these models represent a preferable option over human analysis for data mining and structuring information in large collections of clinical reports.
Publisher
Cold Spring Harbor Laboratory
Reference13 articles.
1. Natural Language Processing in Oncology
2. Use of Natural Language Processing to Extract Clinical Cancer Phenotypes from Electronic Medical Records
3. A meta-analysis of Watson for Oncology in clinical application
4. A. Vaswani et al., “Attention is All you Need,” in Advances in Neural Information Processing Systems, Curran Associates, Inc., 2017. Accessed: Feb. 22, 2024. [Online]. Available: https://papers.nips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html