Assessing the Difficulty and Time Cost of De-identification in Clinical Narratives-Reference-Cited by-同舟云学术

Assessing the Difficulty and Time Cost of De-identification in Clinical Narratives

Published:2006 Issue:03 Volume:45 Page:246-252
ISSN:0026-1270
Container-title:Methods of Information in Medicine
language:en
Short-container-title:Methods Inf Med

Author:

Phillips W. F.,Phansalkar S.,Sims S. A.,Hurdle J. F.,Dorr D. A.

Abstract

Summary Objective: To characterize the difficulty confronting investigators in removing protected health information (PHI) from cross-discipline, free-text clinical notes, an important challenge to clinical informatics research as recalibrated by the introduction of the US Health Insurance Portability and Accountability Act (HIPAA) and similar regulations. Methods: Randomized selection of clinical narratives from complete admissions written by diverse providers, reviewed using a two-tiered rater system and simple automated regular expression tools. For manual review, two independent reviewers used simple search and replace algorithms and visual scanning to find PHI as defined by HIPAA, followed by an independent second review to detect any missed PHI. Simple automated review was also performed for the “easy” PHI that are number- or date-based. Results: From 262 notes, 2074 PHI, or 7.9 ± 6.1 per note, were found. The average recall (or sensitivity) was 95.9% while precision was 99.6% for single reviewers. Agreement between individual reviewers was strong (ICC = 0.99), although some asymmetry in errors was seen between reviewers (p = 0.001). The automated technique had better recall (98.5%) but worse precision (88.4%) for its subset of identifiers. Manually de-identifying a note took 87.3 ± 61 seconds on average. Conclusions: Manual de-identification of free-text notes is tedious and time-consuming, but even simple PHI is difficult to automatically identify with the exactitude required under HIPAA.

Publisher

Georg Thieme Verlag KG

Subject

Health Information Management,Advanced and Specialised Nursing,Health Informatics

Link

http://www.thieme-connect.de/products/ejournals/pdf/10.1055/s-0038-1634080.pdf

Cited by 31 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. De-identification of clinical free text using natural language processing: A systematic review of current approaches;Artificial Intelligence in Medicine;2024-05

2. Qualitative Data Reuse in Practice;Synthesis Lectures on Information Concepts, Retrieval, and Services;2024

3. PIILO: an open-source system for personally identifiable information labeling and obfuscation;Information and Learning Sciences;2023-10-18

4. Named Entity Recognition for De-identifying Real-World Health Records in Spanish;Computational Science – ICCS 2023;2023

5. Evaluation of an automated Presidio anonymisation model for unstructured radiation oncology electronic medical records in an Australian setting;International Journal of Medical Informatics;2022-12