A Methodology for Controlling Bias and Fairness in Synthetic Data Generation-Reference-Cited by-同舟云学术

A Methodology for Controlling Bias and Fairness in Synthetic Data Generation

Published:2022-05-04 Issue:9 Volume:12 Page:4619
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Barbierato Enrico^ORCID,Vedova Marco L. Della^ORCID,Tessera Daniele^ORCID,Toti Daniele^ORCID,Vanoli Nicola^ORCID

Abstract

The development of algorithms, based on machine learning techniques, supporting (or even replacing) human judgment must take into account concepts such as data bias and fairness. Though scientific literature proposes numerous techniques to detect and evaluate these problems, less attention has been dedicated to methods generating intentionally biased datasets, which could be used by data scientists to develop and validate unbiased and fair decision-making algorithms. To this end, this paper presents a novel method to generate a synthetic dataset, where bias can be modeled by using a probabilistic network exploiting structural equation modeling. The proposed methodology has been validated on a simple dataset to highlight the impact of tuning parameters on bias and fairness, as well as on a more realistic example based on a loan approval status dataset. In particular, this methodology requires a limited number of parameters compared to other techniques for generating datasets with a controlled amount of bias and fairness.

Publisher

MDPI AG

Subject

Fluid Flow and Transfer Processes,Computer Science Applications,Process Chemistry and Technology,General Engineering,Instrumentation,General Materials Science

Link

https://www.mdpi.com/2076-3417/12/9/4619/pdf

Reference24 articles.

1. Why Digital Transformation Is an Ongoing Journey https://www.forbes.com/sites/forbestechcouncil/2021/11/10/why-digital-transformation-is-an-ongoing-journey/?sh=672f83985bb9

2. 5 Key Factors Holding Small Businesses Back from Joining the “Data Revolution” https://medium.com/analytics-for-humans/5-key-factors-holding-small-businesses-back-from-joining-the-data-revolution-6b95618deb7f

3. Expert Panel, 13 Strategies For Collecting High-Quality Data https://www.forbes.com/sites/forbescommunicationscouncil/2020/11/17/13-strategies-for-collecting-high-quality-data/?sh=6a6a5b763f1d

4. Bias in computer systems

5. FairGAN: Fairness-aware Generative Adversarial Networks

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Generating synthetic data with variational autoencoder to address class imbalance of graph attention network prediction model for construction management;Advanced Engineering Informatics;2024-10

2. Bias and Cyberbullying Detection and Data Generation Using Transformer Artificial Intelligence Models and Top Large Language Models;Electronics;2024-08-29

3. Exploring Innovative Approaches to Synthetic Tabular Data Generation;Electronics;2024-05-17

4. Synthetic Dataset Generation for Fairer Unfairness Research;Proceedings of the 14th Learning Analytics and Knowledge Conference;2024-03-18

5. A Methodology and an Empirical Analysis to Determine the Most Suitable Synthetic Data Generator;IEEE Access;2024