CVDP k-means clustering algorithm for differential privacy based on coefficient of variation

Author:

Kong Yuting123,Qian Yurong123,Tan Fuxiang123,Bai Lu123,Shao Jinxin123,Ma Tinghuai4,Tereshchenko Sergei Nikolayevich5

Affiliation:

1. School of Software, Xinjiang University, Urumqi, Xinjiang Uygur Autonomous Region, China

2. Key Laboratory of Signal Detection and Processing in Xinjiang Uygur Autonomous Region, Xinjiang University, Urumqi, China

3. Key Laboratory of Software Engineering, Xinjiang University, Urumqi, China

4. Nanjing University of Information Science & Technology, Nanjing, China

5. Novosibirsk State University of Economics and Management (NSUEM), Russia

Abstract

Data clustering has been applied and developed in all walks of life, which can provide convenience for enterprise service optimization. However, when the original data to be analyzed contains users’ personal privacy information, the clustering analysis process of the data holder may expose users’ privacy. Differential privacy k-means algorithm is a clustering method based on differential privacy protection technology, which can solve the privacy disclosure problem in the process of data clustering. In the differential privacy k-means algorithm, Laplacian noise controlled by privacy parameter ɛ is added to the center point of clustering to protect user sensitive information and clustering results in the original data, but the addition of noise will affect the utility of clustering. In order to balance the availability and privacy of the differential privacy k-means clustering algorithm, the research on the improvement of the algorithm pays more attention to the selection of the initial clustering center or the optimization of the outlier processing, but does not consider the different contribution degree of each dimension data to the clustering. Therefore, this paper proposes a differential privacy CVDP k-means clustering algorithm based on coefficient of variation. The CVDP scheme first eliminates outliers in the original data through data density, and then designs weighted data point similarity calculation method and initial centroid selection method using variation coefficient. Experimental results show that CVDP k-means algorithm has some improvements in availability, performance and privacy.

Publisher

IOS Press

Subject

Artificial Intelligence,General Engineering,Statistics and Probability

Reference27 articles.

1. Survey on Privacy-Preserving Machine Learning;Liu;Journal of Computer Research and Development,2020

2. Survey on privacy preserving techniques for machine learning;Tan;Journal of Software,2020

3. Data mining privacy preserving: Research agenda,e;Kreso;Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery,2021

4. Improving healthcare services using source anonymous scheme with privacy preserving distributed healthcare data collection and mining;Domadiya;Computing,2021

5. Privacy-preserving data mining of cross-border financial flows;Sekgoka;Cogent Engineering,2022

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3