Affiliation:
1. University of Padua, Padova, Italy
Abstract
We propose the
Assessor-driven Weighted Averages for Retrieval Evaluation (AWARE)
probabilistic framework, a novel methodology for dealing with multiple crowd assessors that may be contradictory and/or noisy. By modeling relevance judgements and crowd assessors as sources of uncertainty, AWARE takes the expectation of a generic performance measure, like Average Precision, composed with these random variables. In this way, it approaches the problem of aggregating different crowd assessors from a new perspective, that is, directly combining the performance measures computed on the ground truth generated by the crowd assessors instead of adopting some classification technique to merge the labels produced by them. We propose several unsupervised estimators that instantiate the AWARE framework and we compare them with state-of-the-art approaches, that is,Majoriity Vote and Expectation Maximization, on TREC collections. We found that AWARE approaches improve in terms of their capability of correctly ranking systems and predicting their actual performance scores.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Reference83 articles.
1. Implementing crowdsourcing-based relevance experimentation: an industrial perspective
2. Using crowdsourcing for TREC relevance assessment
3. P. Bailey N. Craswell I. Soboroff P. Thomas A. P. de Vries and E. Yilmaz. 2008. Relevance assessment: Are judges exchangeable and does it matter? See [12] 667--674. P. Bailey N. Craswell I. Soboroff P. Thomas A. P. de Vries and E. Yilmaz. 2008. Relevance assessment: Are judges exchangeable and does it matter? See [12] 667--674.
4. M. Bashir J. Anderton J. Wu M. Ekstrand-Abueg P. B. Golbus V. Pavlu and J. A. Aslam. 2013. Northeastern university runs at the TREC12 crowdsourcing track. See [74]. M. Bashir J. Anderton J. Wu M. Ekstrand-Abueg P. B. Golbus V. Pavlu and J. A. Aslam. 2013. Northeastern university runs at the TREC12 crowdsourcing track. See [74].
5. R. Blanco H. Halpin D. M. Herzig P. Mika J. Pound and H. S. Thompson. 2011. Repeatable and reliable search system evaluation using crowdsourcing. See [50] 923--932. R. Blanco H. Halpin D. M. Herzig P. Mika J. Pound and H. S. Thompson. 2011. Repeatable and reliable search system evaluation using crowdsourcing. See [50] 923--932.
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献