Affiliation:
1. University of Massachusetts Lowell, Lowell, MA
2. University of Massachusetts Medical School, Worcester, MA
Abstract
Medical and health data are often collected for studying a specific disease. For such same-disease microdata, a privacy disclosure occurs as long as an individual is known to be in the microdata. Individuals in same-disease microdata are thus subject to higher disclosure risk than those in microdata with different diseases. This important problem has been overlooked in data-privacy research and practice, and no prior study has addressed this problem. In this study, we analyze the disclosure risk for the individuals in same-disease microdata and propose a new metric that is appropriate for measuring disclosure risk in this situation. An efficient algorithm is designed and implemented for anonymizing same-disease data to minimize the disclosure risk while keeping data utility as good as possible. An experimental study was conducted on real patient and population data. Experimental results show that traditional reidentification risk measures underestimate the actual disclosure risk for the individuals in same-disease microdata and demonstrate that the proposed approach is very effective in reducing the actual risk for same-disease data. This study suggests that privacy protection policy and practice for sharing medical and health data should consider not only the individuals’ identifying attributes but also the health and disease information contained in the data. It is recommended that data-sharing entities employ a statistical approach, instead of the HIPAA's Safe Harbor policy, when sharing same-disease microdata.
Funder
National Institute of Arthritis and Musculoskeletal and Skin Diseases
U.S. National Library of Medicine
Publisher
Association for Computing Machinery (ACM)
Subject
Information Systems and Management,Information Systems
Reference29 articles.
1. Security-control methods for statistical databases: a comparative study
2. Data and Analytics Challenges for a Learning Healthcare System
3. L. Breiman J. H. Friedman R. A. Olshen and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth Belmont CA. L. Breiman J. H. Friedman R. A. Olshen and C. J. Stone. 1984. Classification and Regression Trees. Wadsworth Belmont CA.
4. Challenges for privacy preservation in data integration
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. HIV Client Perspectives on Digital Health in Malawi;Proceedings of the CHI Conference on Human Factors in Computing Systems;2024-05-11
2. Algorithms to anonymize structured medical and healthcare data: A systematic review;Frontiers in Bioinformatics;2022-12-22
3. A Privacy Protection Method for Medical Health Data;2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta);2022-12
4. Regulatory Framework around Data Governance and External Benchmarking;Journal of Legal Affairs and Dispute Resolution in Engineering and Construction;2022-05
5. Tensions and Mitigations: Understanding Concerns and Values around Smartphone Data Collection for Public Health Emergencies;Proceedings of the ACM on Human-Computer Interaction;2021-10-13