MirrorFair: Fixing Fairness Bugs in Machine Learning Software via Counterfactual Predictions-Reference-Cited by-同舟云学术

MirrorFair: Fixing Fairness Bugs in Machine Learning Software via Counterfactual Predictions

Published:2024-07-12 Issue:FSE Volume:1 Page:2121-2143
ISSN:2994-970X
Container-title:Proceedings of the ACM on Software Engineering
language:en
Short-container-title:Proc. ACM Softw. Eng.

Author:

Xiao Ying¹^ORCID,Zhang Jie M.²^ORCID,Liu Yepang³^ORCID,Mousavi Mohammad Reza²^ORCID,Liu Sicen³^ORCID,Xue Dingyuan³^ORCID

Affiliation:

1. Southern University of Science and Technology, Shenzhen, China / King's College London, London, United Kingdom

2. King's College London, London, United Kingdom

3. Southern University of Science and Technology, Shenzhen, China

Abstract

With the increasing utilization of Machine Learning (ML) software in critical domains such as employee hiring, college admission, and credit evaluation, ensuring fairness in the decision-making processes of underlying models has emerged as a paramount ethical concern. Nonetheless, existing methods for rectifying fairness issues can hardly strike a consistent trade-off between performance and fairness across diverse tasks and algorithms. Informed by the principles of counterfactual inference, this paper introduces MirrorFair, an innovative adaptive ensemble approach designed to mitigate fairness concerns. MirrorFair initially constructs a counterfactual dataset derived from the original data, training two distinct models—one on the original dataset and the other on the counterfactual dataset. Subsequently, MirrorFair adaptively combines these model predictions to generate fairer final decisions. We conduct an extensive evaluation of MirrorFair and compare it with 15 existing methods across a diverse range of decision-making scenarios. Our findings reveal that MirrorFair outperforms all the baselines in every measurement (i.e., fairness improvement, performance preservation, and trade-off metrics). Specifically, in 93% of cases, MirrorFair surpasses the fairness and performance trade-off baseline proposed by the benchmarking tool Fairea, whereas the state-of-the-art method achieves this in only 88% of cases. Furthermore, MirrorFair consistently demonstrates its superiority across various tasks and algorithms, ranking first in balancing model performance and fairness in 83% of scenarios. To foster replicability and future research, we have made our code, data, and results openly accessible to the research community.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3660801

Reference63 articles.

1. 1994. The German dataset. https://archive.ics.uci.edu/ml/datasets/statlog (german+credit+data)

2. 2014. The Bank dataset. https://archive.ics.uci.edu/ml/datasets/bank+marketing

3. 2015. The Mep dataset. https://meps.ahrq.gov/mepsweb/data_stats/download_data_files.jsp

4. 2016. The Compas dataset. https://github.com/propublica/compas-analysis

5. 2017. The Adult Census Income dataset. https://archive.ics.uci.edu/ml/datasets/adult