A Game Theory Approach for Estimating Reliability of Crowdsourced Relevance Assessments-Reference-Cited by-同舟云学术

A Game Theory Approach for Estimating Reliability of Crowdsourced Relevance Assessments

Published:2022-07-31 Issue:3 Volume:40 Page:1-29
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Moshfeghi Yashar¹^ORCID,Huertas-Rosero Alvaro Francisco²

Affiliation:

1. University of Strathclyde, Glasgow, UK

2. Centre for Human Ecology, Glasgow, UK

Abstract

In this article, we propose an approach to improve quality in crowdsourcing (CS) tasks using Task Completion Time (TCT) as a source of information about the reliability of workers in a game-theoretical competitive scenario. Our approach is based on the hypothesis that some workers are more risk-inclined and tend to gamble with their use of time when put to compete with other workers. This hypothesis is supported by our previous simulation study. We test our approach with 35 topics from experiments on the TREC-8 collection being assessed as relevant or non-relevant by crowdsourced workers both in a competitive (referred to as “Game”) and non-competitive (referred to as “Base”) scenario. We find that competition changes the distributions of TCT, making them sensitive to the quality (i.e., wrong or right) and outcome (i.e., relevant or non-relevant) of the assessments. We also test an optimal function of TCT as weights in a weighted majority voting scheme. From probabilistic considerations, we derive a theoretical upper bound for the weighted majority performance of cohorts of 2, 3, 4, and 5 workers, which we use as a criterion to evaluate the performance of our weighting scheme. We find our approach achieves a remarkable performance, significantly closing the gap between the accuracy of the obtained relevance judgements and the upper bound. Since our approach takes advantage of TCT, which is an available quantity in any CS tasks, we believe it is cost-effective and, therefore, can be applied for quality assurance in crowdsourcing for micro-tasks.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3480965

Reference71 articles.

1. Omar Alonso. 2013. Implementing crowdsourcing-based relevance experimentation: An industrial perspective. Info Retriev 16 2 (2013) 1–20.

2. Omar Alonso and Ricardo Baeza-Yates. 2011. Design and implementation of relevance assessments using crowdsourcing. In Proceedings of the 33rd ECIR Conference Advances in Information Retrieval (ECIR’11) . Springer 153–164.

3. Omar Alonso and Stefano Mizzaro. 2009. Can we get rid of TREC assessors? Using mechanical turk for relevance assessment. In Proceedings of the SIGIR Workshop on the Future of IR Evaluation . ACM 557–566.

4. Omar Alonso and Stefano Mizzaro. 2012. Using crowdsourcing for TREC relevance assessment. Info. Process. Manage. 48 6 (2012) 1053–1066. http://dx.doi.org/10.1016/j.ipm.2012.01.004

5. Omar Alonso Daniel E. Rose and Benjamin Stewart. 2008. Crowdsourcing for relevance evaluation. SIGIR Forum 42 2 (2008) 9–15. http://dx.doi.org/10.1145/1480506.1480508

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The State of Pilot Study Reporting in Crowdsourcing: A Reflection on Best Practices and Guidelines;Proceedings of the ACM on Human-Computer Interaction;2024-04-17

2. Extending Label Aggregation Models with a Gaussian Process to Denoise Crowdsourcing Labels;Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval;2023-07-18