Human-powered sorts and joins-Reference-Cited by-同舟云学术

Human-powered sorts and joins

Published:2011-09 Issue:1 Volume:5 Page:13-24
ISSN:2150-8097
Container-title:Proceedings of the VLDB Endowment
language:en
Short-container-title:Proc. VLDB Endow.

Author:

Marcus Adam¹,Wu Eugene¹,Karger David¹,Madden Samuel¹,Miller Robert¹

Affiliation:

1. MIT CSAIL

Abstract

Crowdsourcing markets like Amazon's Mechanical Turk (MTurk) make it possible to task people with small jobs, such as labeling images or looking up phone numbers, via a programmatic interface. MTurk tasks for processing datasets with humans are currently designed with significant reimplementation of common workflows and ad-hoc selection of parameters such as price to pay per task. We describe how we have integrated crowds into a declarative workflow engine called Qurk to reduce the burden on workflow designers. In this paper, we focus on how to use humans to compare items for sorting and joining data, two of the most common operations in DBMSs. We describe our basic query interface and the user interface of the tasks we post to MTurk. We also propose a number of optimizations, including task batching, replacing pairwise comparisons with numerical ratings, and pre-filtering tables before joining them, which dramatically reduce the overall cost of running sorts and joins on the crowd. In an experiment joining two sets of images, we reduce the overall cost from $67 in a naive implementation to about $3, without substantially affecting accuracy or latency. In an end-to-end experiment, we reduced cost by a factor of 14.5.

Publisher

VLDB Endowment

Subject

General Earth and Planetary Sciences,Water Science and Technology,Geography, Planning and Development

Link

https://dl.acm.org/doi/pdf/10.14778/2047485.2047487

Cited by 100 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing entity resolution with multichannel BERT: a comprehensive approach;Third International Conference on Algorithms, Microchips, and Network Applications (AMNA 2024);2024-06-08

3. Algorithmic Complexity Attacks on Dynamic Learned Indexes;Proceedings of the VLDB Endowment;2023-12

4. Crowdsourcing of labeling image objects: an online gamification application for data collection;Multimedia Tools and Applications;2023-08-04

5. Efficient crowdsourced best objects finding via superiority probability based ordering for decision support systems;Expert Systems with Applications;2023-08