Abstract
AbstractBackgroundSynoptic reporting, the documenting of clinical information in a structured manner, is known to improve patient care by reducing errors, increasing readability, interoperability, and report completeness. Despite its advantages, manually synthesizing synoptic reports from narrative reports is expensive and error prone when the number of structured fields are many. While the recent revolutionary developments in Large Language Models (LLMs) have significantly advanced natural language processing, their potential for innovations in medicine is yet to be fully evaluated.ObjectivesIn this study, we explore the strengths and challenges of utilizing the state-of-the-art language models in the automatic synthesis of synoptic reports.Materials and MethodsWe use a corpus of 7,774 cancer related, narrative pathology reports, which have annotated reference synoptic reports from Mayo Clinic EHR. Using these annotations as a reference, we reconfigure the state-of-the-art large language models, such as LLAMA-2, to generate the synoptic reports. Our annotated reference synoptic reports contain 22 unique data elements. To evaluate the accuracy of the reports generated by the LLMs, we use several metrics including the BERT F1 Score and verify our results by manual validation.ResultsWe show that using fine-tuned LLAMA-2 models, we can obtain BERT Score F1 of 0.86 or higher across all data elements and BERT F1 scores of 0.94 or higher on over 50% (11 of 22) of the questions. The BERT F1 scores translate to average accuracies of 76% and as high as 81% for short clinical reports.ConclusionsWe demonstrate successful automatic synoptic report generation by fine-tuning large language models.
Publisher
Cold Spring Harbor Laboratory