Author:
Ferrara Antonio,Bonchi Francesco,Fabbri Francesco,Karimi Fariba,Wagner Claudia
Abstract
AbstractHuman feedback is often used, either directly or indirectly, as input to algorithmic decision making. However, humans are biased: if the algorithm that takes as input the human feedback does not control for potential biases, this might result in biased algorithmic decision making, which can have a tangible impact on people’s lives. In this paper, we study how to detect and correct for evaluators’ bias in the task of ranking people (or items) from pairwise comparisons. Specifically, we assume we are given pairwise comparisons of the items to be ranked produced by a set of evaluators. While the pairwise assessments of the evaluators should reflect to a certain extent the latent (unobservable) true quality scores of the items, they might be affected by each evaluator’s own bias against, or in favor, of some groups of items. By detecting and amending evaluators’ biases, we aim to produce a ranking of the items that is, as much as possible, in accordance with the ranking one would produce by having access to the latent quality scores. Our proposal is a novel method that extends the classic Bradley-Terry model by having a bias parameter for each evaluator which distorts the true quality score of each item, depending on the group the item belongs to. Thanks to the simplicity of the model, we are able to write explicitly its log-likelihood w.r.t. the parameters (i.e., items’ latent scores and evaluators’ bias) and optimize by means of the alternating approach. Our experiments on synthetic and real-world data confirm that our method is able to reconstruct the bias of each single evaluator extremely well and thus to outperform several non-trivial competitors in the task of producing a ranking which is as much as possible close to the unbiased ranking.
Funder
European Union’s Horizon 2020 for the project : “NoBIAS - Artificial Intelligence without Bias”
Publisher
Springer Science and Business Media LLC
Reference41 articles.
1. Almaatouq A, Krafft P, Dunham Y, Rand DG, Pentland A (2020) Turkers of the world unite: multilevel in-group bias among crowdworkers on amazon mechanical Turk. Soc Psychol Personal Scince 11(2):151–159
2. Alvarez JM, Ruggieri S (2023) Counterfactual situation testing: Uncovering discrimination under fairness given the difference. Preprint arXiv:2302.11944
3. Beaver RJ, Gokhale D (1975) A model to incorporat within-pair order effects in paired comparisons. Commun Stat Theory Methods 4(10):923–939
4. Bradley RA, Terry ME (1952) Rank analysis of incomplete block designs: I. the method of paired comparisons. Biometrika 39(3/4):324–345
5. Bugakova N, Fedorova V, Gusev G, Drutsa A (2019) Aggregation of pairwise comparisons with reduction of biases. Preprint arXiv:1906.03711
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献