Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes-Reference-Cited by-同舟云学术

Protected Health Information filter (Philter): accurately and securely de-identifying free-text clinical notes

Published:2020-04-14 Issue:1 Volume:3 Page:
ISSN:2398-6352
Container-title:npj Digital Medicine
language:en
Short-container-title:npj Digit. Med.

Author:

Norgeot Beau,Muenzen Kathleen^ORCID,Peterson Thomas A.^ORCID,Fan Xuancheng,Glicksberg Benjamin S.^ORCID,Schenk Gundolf^ORCID,Rutenberg Eugenia,Oskotsky Boris,Sirota Marina,Yazdany Jinoos,Schmajuk Gabriela,Ludwig Dana,Goldstein Theodore,Butte Atul J.^ORCID

Abstract

AbstractThere is a great and growing need to ascertain what exactly is the state of a patient, in terms of disease progression, actual care practices, pathology, adverse events, and much more, beyond the paucity of data available in structured medical record data. Ascertaining these harder-to-reach data elements is now critical for the accurate phenotyping of complex traits, detection of adverse outcomes, efficacy of off-label drug use, and longitudinal patient surveillance. Clinical notes often contain the most detailed and relevant digital information about individual patients, the nuances of their diseases, the treatment strategies selected by physicians, and the resulting outcomes. However, notes remain largely unused for research because they contain Protected Health Information (PHI), which is synonymous with individually identifying data. Previous clinical note de-identification approaches have been rigid and still too inaccurate to see any substantial real-world use, primarily because they have been trained with too small medical text corpora. To build a new de-identification tool, we created the largest manually annotated clinical note corpus for PHI and develop a customizable open-source de-identification software called Philter (“Protected Health Information filter”). Here we describe the design and evaluation of Philter, and show how it offers substantial real-world improvements over prior methods.

Funder

Achievement Rewards for College Scientists Foundation

Silicon Valley Community Foundation

U.S. Department of Health & Human Services | National Institutes of Health

U.S. Department of Health & Human Services | NIH | U.S. National Library of Medicine

U.S. Department of Health & Human Services | Agency for Healthcare Research and Quality

Publisher

Springer Science and Business Media LLC

Subject

Health Information Management,Health Informatics,Computer Science Applications,Medicine (miscellaneous)

Link

http://www.nature.com/articles/s41746-020-0258-y.pdf

Reference25 articles.

1. Kirby, J. C. et al. PheKB: a catalog and workflow for creating electronic phenotype algorithms for transportability. J. Am. Med. Inf. Assoc. 23, 1046–1052 (2016).

2. Norgeot, B., Glicksberg, B. S. & Butte, A. J. A call for deep-learning healthcare. Nat. Med. 25, 14–15 (2019).

3. Makary, M. A. & Daniel, M. Medical error-the third leading cause of death in the US. BMJ 353, i2139 (2016).

4. O’Malley, K. J. et al. Measuring diagnoses: ICD code accuracy. Health Serv. Res. 40, 1620–1639 (2005).

5. Iqbal, E. et al. ADEPt, a semantically-enriched pipeline for extracting adverse drug events from free-text electronic health records. PLoS ONE 12, e0187121 (2017).

Cited by 54 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comparison of Diagnosis Codes to Clinical Notes in Classifying Patients with Diabetic Retinopathy;Ophthalmology Science;2024-11

2. Masketeer: An Ensemble-Based Pseudonymization Tool with Entity Recognition for German Unstructured Medical Free Text;Future Internet;2024-08-06

3. “My Mom Is a Fighter”;CHEST;2024-08

4. Beyond Individual Concerns: Multi-user Privacy in Large Language Models;ACM Conversational User Interfaces 2024;2024-07-08

5. Harnessing EHR data for health research;Nature Medicine;2024-07