Counting with the crowd-Reference-Cited by-同舟云学术

Counting with the crowd

Published:2012-12 Issue:2 Volume:6 Page:109-120
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Marcus Adam¹,Karger David¹,Madden Samuel¹,Miller Robert¹,Oh Sewoong¹

Affiliation:

1. MIT, CSAIL

Abstract

In this paper, we address the problem of selectivity estimation in a crowdsourced database. Specifically, we develop several techniques for using workers on a crowdsourcing platform like Amazon's Mechanical Turk to estimate the fraction of items in a dataset (e.g., a collection of photos) that satisfy some property or predicate (e.g., photos of trees). We do this without explicitly iterating through every item in the dataset. This is important in crowd-sourced query optimization to support predicate ordering and in query evaluation, when performing a GROUP BY operation with a COUNT or AVG aggregate. We compare sampling item labels, a traditional approach, to showing workers a collection of items and asking them to estimate how many satisfy some predicate. Additionally, we develop techniques to eliminate spammers and colluding attackers trying to skew selectivity estimates when using this count estimation approach. We find that for images, counting can be much more effective than sampled labeling, reducing the amount of work necessary to arrive at an estimate that is within 1% of the true fraction by up to an order of magnitude, with lower worker latency. We also find that sampled labeling outperforms count estimation on a text processing task, presumably because people are better at quickly processing large batches of images than they are at reading strings of text. Our spammer detection technique, which is applicable to both the label- and count-based approaches, can improve accuracy by up to two orders of magnitude.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/2535568.2448944

Cited by 54 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Crowdsourcing of labeling image objects: an online gamification application for data collection;Multimedia Tools and Applications;2023-08-04

2. A hunt for the Snark: Annotator Diversity in Data Practices;Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems;2023-04-19

3. Adaptive Clustering-Based Collusion Detection in Crowdsourcing;Lecture Notes in Computer Science;2023

4. Self-paced annotations of crowd workers;Knowledge and Information Systems;2022-09-22

5. Crowdsourcing of Labeling Image Objects: An Online Gamification Application for Data Collection;SSRN Electronic Journal;2022