The trade-off between the risk of disclosure and data utility in SDC: A case of data from a survey of accidents at work1

Author:

Młodak Andrzej12,Pietrzak Michał1,Józefowski Tomasz31

Affiliation:

1. Statistical Office in Poznań, Centre for Small Area Estimation, Poznań, Poland

2. Inter-faculty Department of Mathematics and Statistics, Calisia University – Kalisz, Poland

3. Poznań University of Economics and Business, Poznań, Poland

Abstract

One of the key problems associated with Statistical Disclosure Control is ensuring an optimal trade-off between minimizing the risk of unit identification and maximizing the utility of data to be disseminated (which means minimizing information loss due to the application of SDC methods). In practice, it is usually achieved by defining how much risk can be accepted for any given unit, and then doing the best to modify the data set so that the risk is below the preset threshold while maximising the utility. Moreover, variables from statistical surveys vary not only in terms of their measurement scale but also as regards the role they play in the SDC process. All these aspects should therefore be taken into account when one tries to find this trade-off. In the paper we present a way of assessing whether an optimal trade-off has been achieved. Two main aspects of measuring the risk of disclosure are discussed. The first one is internal risk, i.e. the risk of disclosing confidential information only on the basis on disseminated microdata after the application of SDC (i.e. no attempt of combining data with external information is made); the second one is external risk, when the user has access to an alternative data set containing information that can be linked with statistical data in order to identify a unit. We show that it is possible to measure external risk and information loss while accounting for the measurement scale of variables. In our empirical study we used data from an annual survey of accidents at work for 2017. We compared complex information loss and the risk of disclosure in the original data files and those subjected to SDC using methods implemented in the new working version of the sdcMicro R package. We present the underlying assumptions and results of the SDC process, highlighting the benefits and drawbacks of the tools used in the study, which was conducted in 2020 and 2021 in the Centre for Small Area Estimation at the Statistical Office in Poznań.

Publisher

IOS Press

Subject

Statistics, Probability and Uncertainty,Economics and Econometrics,Management Information Systems

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3