Searching the PDF Haystack: Automated Knowledge Discovery in Scanned EHR Documents

Author:

Kostrinsky-Thomas Alexander L.1,Hisama Fuki M.2,Payne Thomas H.3

Affiliation:

1. College of Osteopathic Medicine, Pacific Northwest University of Health Sciences, 200 University Pkwy Yakima, Washington, United States

2. Division of Medical Genetics, Department of Medicine, University of Washington School of Medicine, Seattle, Washington, United States

3. Department of Medicine, University of Washington School of Medicine, Seattle, Washington, United States

Abstract

Abstract Background Clinicians express concern that they may be unaware of important information contained in voluminous scanned and other outside documents contained in electronic health records (EHRs). An example is “unrecognized EHR risk factor information,” defined as risk factors for heritable cancer that exist within a patient's EHR but are not known by current treating providers. In a related study using manual EHR chart review, we found that half of the women whose EHR contained risk factor information meet criteria for further genetic risk evaluation for heritable forms of breast and ovarian cancer. They were not referred for genetic counseling. Objectives The purpose of this study was to compare the use of automated methods (optical character recognition with natural language processing) versus human review in their ability to identify risk factors for heritable breast and ovarian cancer within EHR scanned documents. Methods We evaluated the accuracy of the chart review by comparing our criterion standard (physician chart review) versus an automated method involving Amazon's Textract service (Amazon.com, Seattle, Washington, United States), a clinical language annotation modeling and processing toolkit (CLAMP) (Center for Computational Biomedicine at The University of Texas Health Science, Houston, Texas, United States), and a custom-written Java application. Results We found that automated methods identified most cancer risk factor information that would otherwise require clinician manual review and therefore is at risk of being missed. Conclusion The use of automated methods for identification of heritable risk factors within EHRs may provide an accurate yet rapid review of patients' past medical histories. These methods could be further strengthened via improved analysis of handwritten notes, tables, and colloquial phrases.

Publisher

Georg Thieme Verlag KG

Subject

Health Information Management,Computer Science Applications,Health Informatics

Reference15 articles.

1. Salience of medical concepts of inside clinical texts and outside medical records for referred cardiovascular patients;S Moon;Journal of Healthcare Informatics Research.,2019

2. What affects clinicians' usage of health information exchange?;R Rudin;Appl Clin Inform,2011

3. Development of an optical character recognition pipeline for handwritten form fields from an electronic health record;L V Rasmussen;J Am Med Inform Assoc,2012

4. A qualitative analysis of EHR clinical document synthesis by clinicians;O Farri;AMIA Annu Symp Proc,2012

5. Determining Onset for Familial Breast and Colorectal Cancer from Family History Comments in the Electronic Health Record;D L Mowery;AMIA Jt Summits Transl Sci Proc,2019

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3