Customize Deep Learning-based De-Identification Systems Using Local Clinical Notes - A Study of Sample Size

Author:

Yang Xi,Bian Jiang,Wu Yonghui

Abstract

ABSTRACTElectronic Health Records (EHRs) are a valuable resource for both clinical and translational research. However, much detailed patient information is embedded in clinical narratives, including a large number of patients’ identifiable information. De-identification of clinical notes is a critical technology to protect the privacy and confidentiality of patients. Previous studies presented many automated de-identification systems to capture and remove protected health information from clinical text. However, most of them were tested only in one institute setting where training and test data were from the same institution. Directly adapting these systems without customization could lead to a dramatic performance drop. Recent studies have shown that fine-tuning is a promising method to customize deep learning-based NLP systems across different institutes. However, it’s still not clear how much local data is required. In this study, we examined the customizing of a deep learning-based de-identification system using different sizes of local notes from UF Health. Our results showed that the fine-tuning could significantly improve the model performance even on a small local dataset. Yet, when the local data exceeded a threshold (e.g., 700 notes in this study), the performance improvement became marginal.

Publisher

Cold Spring Harbor Laboratory

Reference30 articles.

1. Electronic health record adoption in US hospitals: the emergence of a digital “advanced use” divide;Journal of the American Medical Informatics Association,2017

2. A survey of practices for the use of electronic health records to support research recruitment;Journal of Clinical and Translational Science,2017

3. Committee on the Recommended Social and Behavioral Domains and Measures for Electronic Health Records, Board on Population Health and Public Health Practice, and Institute of Medicine. 2015. Capturing Social and Behavioral Domains and Measures in Electronic Health Records: Phase 2. National Academies Press (US), Washington (DC).

4. Committee on the Recommended Social and Behavioral Domains and Measures for Electronic Health Records, Board on Population Health and Public Health Practice, and Institute of Medicine. 2014. Capturing Social and Behavioral Domains in Electronic Health Records: Phase 1. National Academies Press (US), Washington (DC).

5. Sergey Goryachev , Hyeoneui Kim , and Qing Zeng-Treitler . 2008. Identification and Extraction of Family History Information from Clinical Reports. AMIA Annual Symposium Proceedings 2008: 247–251.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3