Measuring the impact of anonymization on real-world consolidated health datasets engineered for secondary research use: Experiments in the context of MODELHealth project-Reference-Cited by-同舟云学术

Measuring the impact of anonymization on real-world consolidated health datasets engineered for secondary research use: Experiments in the context of MODELHealth project

Published:2022-09-01 Issue: Volume:4 Page:
ISSN:2673-253X
Container-title:Frontiers in Digital Health
language:
Short-container-title:Front. Digit. Health

Author:

Pitoglou Stavros,Filntisi Arianna,Anastasiou Athanasios,Matsopoulos George K.,Koutsouris Dimitrios

Abstract

IntroductionElectronic Health Records (EHRs) are essential data structures, enabling the sharing of valuable medical care information for a diverse patient population and being reused as input to predictive models for clinical research. However, issues such as the heterogeneity of EHR data and the potential compromisation of patient privacy inhibit the secondary use of EHR data in clinical research.ObjectivesThis study aims to present the main elements of the MODELHealth project implementation and the evaluation method that was followed to assess the efficiency of its mechanism.MethodsThe MODELHealth project was implemented as an Extract-Transform-Load system that collects data from the hospital databases, performs harmonization to the HL7 FHIR standard and anonymization using the k-anonymity method, before loading the transformed data to a central repository. The integrity of the anonymization process was validated by developing a database query tool. The information loss occurring due to the anonymization was estimated with the metrics of generalized information loss, discernibility and average equivalence class size for various values of k.ResultsThe average values of generalized information loss, discernibility and average equivalence class size obtained across all tested datasets and k values were 0.008473 ± 0.006216252886, 115,145,464.3 ± 79,724,196.11 and 12.1346 ± 6.76096647, correspondingly. The values of those metrics appear correlated with factors such as the k value and the dataset characteristics, as expected.ConclusionThe experimental results of the study demonstrate that it is feasible to perform effective harmonization and anonymization on EHR data while preserving essential patient information.

Publisher

Frontiers Media SA

Subject

General Engineering

Reference44 articles.

1. A review of PHR, EMR and EHR integration: a more personalized healthcare and public health policy;Heart;Health Policy Technol,2017

2. Publishing data from electronic health records while preserving privacy: a survey of algorithms;Gkoulalas-Divanis;J Biomed Inform,2014

3. Quantifying the costs and benefits of privacy-preserving health data publishing;Khokhar;J Biomed Inform,2014

4. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research;Weiskopf;J Am Med Inform Assoc,2013

5. Deep patient: an unsupervised representation to predict the future of patients from the electronic health records;Miotto;Sci Rep,2016

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A systematic review of the use of FHIR to support clinical research, public health and medical education;Data Technologies and Applications;2024-09-03

2. Artificial Intelligence Models in Health Information Exchange: A Systematic Review of Clinical Implications;Healthcare;2023-09-19

3. Data Quality– and Utility-Compliant Anonymization of Common Data Model–Harmonized Electronic Health Record Data: Protocol for a Scoping Review;JMIR Research Protocols;2023-08-11