Abstract
Abstract
Background
Data privacy is one of the biggest challenges for any organisation which processes personal data, especially in the area of medical research where data include sensitive information about patients and study participants. Sharing of data is therefore problematic, which is at odds with the principle of open data that is so important to the advancement of society and science. Several statistical methods and computational tools have been developed to help data custodians and analysts overcome this challenge.
Methods
In this paper, we propose a new deterministic approach for anonymising personal data. The method stratifies the underlying data by the categorical variables and re-distributes the continuous variables through a k nearest neighbours based algorithm.
Results
We demonstrate the use of the deterministic anonymisation on real data, including data from a sample of Titanic passengers, and data from participants in the 1958 Birth Cohort.
Conclusions
The proposed procedure makes data re-identification difficult while minimising the loss of utility (by preserving the spatial properties of the underlying data); the latter means that informative statistical analysis can still be conducted.
Publisher
Springer Science and Business Media LLC
Subject
Health Informatics,Health Policy,Computer Science Applications
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献