A fair-multicluster approach to clustering of categorical data-Reference-Cited by-同舟云学术

A fair-multicluster approach to clustering of categorical data

Published:2022-11-08 Issue:2 Volume:31 Page:583-604
ISSN:1435-246X
Container-title:Central European Journal of Operations Research
language:en
Short-container-title:Cent Eur J Oper Res

Author:

Santos-Mangudo Carlos^ORCID,Heras Antonio J.^ORCID

Abstract

AbstractIn the last few years, the need of preventing classification biases due to race, gender, social status, etc. has increased the interest in designing fair clustering algorithms. The main idea is to ensure that the output of a cluster algorithm is not biased towards or against specific subgroups of the population. There is a growing specialized literature on this topic, dealing with the problem of clustering numerical data bases. Nevertheless, to our knowledge, there are no previous papers devoted to the problem of fair clustering of pure categorical attributes. In this paper, we show that the Multicluster methodology proposed by Santos and Heras (Interdiscip J Inf Knowl Manag 15:227–246, 2020. https://doi.org/10.28945/4643) for clustering categorical data, can be modified in order to increase the fairness of the clusters. Of course, there is a trade-off between fairness and efficiency, so that an increase in the fairness objective usually leads to a loss of classification efficiency. Yet it is possible to reach a reasonable compromise between these goals, since the methodology proposed by Santos and Heras (2020) can be easily adapted in order to get homogeneous and fair clusters.

Funder

Universidad Complutense de Madrid

Publisher

Springer Science and Business Media LLC

Subject

Management Science and Operations Research

Link

https://link.springer.com/content/pdf/10.1007/s10100-022-00824-2.pdf

Reference51 articles.

1. Abraham SSPD, Sundaram S. S (2020) Fairness in clustering with multiple sensitive attributes. In: Advances in database technology—EDBT, pp 287–298. arXiv:1910.05113

2. Ahmad A, Dey L (2007a) A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set. Pattern Recognit Lett 28(1):110–118. https://doi.org/10.1016/j.patrec.2006.06.006

3. Ahmad A, Dey L (2007b) A k-mean clustering algorithm for mixed numeric and categorical data. Data Knowl Eng 63(2):503–527. https://doi.org/10.1016/j.datak.2007.03.016

4. Altaf S, Waseem Waseem M, Kazmi L (2020) IDCUP algorithm to classifying arbitrary shapes and densities for center-based clustering performance analysis. Interdiscip J Inf Knowl Manag 15:91–108. https://doi.org/10.28945/4541

5. Barocas S, Selbst AD (2016) Big data’s disparate impact. Calif Law Rev 104(3):671–732. https://doi.org/10.2139/ssrn.2477899

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Detecting Outliers in Context of Clustering Imbalanced Categorical Data;International Conference on Information Systems Development;2024-09-09