Affiliation:
1. School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, Texas, USA
2. McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA
3. Division of General Internal Medicine, McGovern Medical School, The University of Texas Health Science Center at Houston, Houston, Texas, USA
Abstract
Abstract
Objectives
Scanned documents (SDs), while common in electronic health records and potentially rich in clinically relevant information, rarely fit well with clinician workflow. Here, we identify scanned imaging reports requiring follow-up with high recall and practically useful precision.
Materials and methods
We focused on identifying imaging findings for 3 common causes of malpractice claims: (1) potentially malignant breast (mammography) and (2) lung (chest computed tomography [CT]) lesions and (3) long-bone fracture (X-ray) reports. We train our ClinicalBERT-based pipeline on existing typed/dictated reports classified manually or using ICD-10 codes, evaluate using a test set of manually classified SDs, and compare against string-matching (baseline approach).
Results
A total of 393 mammograms, 305 chest CT, and 683 bone X-ray reports were manually reviewed. The string-matching approach had an F1 of 0.667. For mammograms, chest CTs, and bone X-rays, respectively: models trained on manually classified training data and optimized for F1 reached an F1 of 0.900, 0.905, and 0.817, while separate models optimized for recall achieved a recall of 1.000 with precisions of 0.727, 0.518, and 0.275. Models trained on ICD-10-labelled data and optimized for F1 achieved F1 scores of 0.647, 0.830, and 0.643, while those optimized for recall achieved a recall of 1.0 with precisions of 0.407, 0.683, and 0.358.
Discussion
Our pipeline can identify abnormal reports with potentially useful performance and so decrease the manual effort required to screen for abnormal findings that require follow-up.
Conclusion
It is possible to automatically identify clinically significant abnormalities in SDs with high recall and practically useful precision in a generalizable and minimally laborious way.
Funder
National Center for Advancing Translational Sciences
Cancer Prevention and Research Institute of Texas
Reynolds and Reynolds Professorship in Clinical Informatics
National Institute of Biomedical Imaging and Bioengineering (NIBIB
Publisher
Oxford University Press (OUP)
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献