Evaluating Online Labor Markets for Experimental Research: Amazon.com's Mechanical Turk

Author:

Berinsky Adam J.,Huber Gregory A.,Lenz Gabriel S.

Abstract

We examine the trade-offs associated with using Amazon.com's Mechanical Turk (MTurk) interface for subject recruitment. We first describe MTurk and its promise as a vehicle for performing low-cost and easy-to-field experiments. We then assess the internal and external validity of experiments performed using MTurk, employing a framework that can be used to evaluate other subject pools. We first investigate the characteristics of samples drawn from the MTurk population. We show that respondents recruited in this manner are often more representative of the U.S. population than in-person convenience samples—the modal sample in published experimental political science—but less representative than subjects in Internet-based panels or national probability samples. Finally, we replicate important published experimental work using MTurk samples.

Publisher

Cambridge University Press (CUP)

Subject

Political Science and International Relations,Sociology and Political Science

Reference71 articles.

1. We also find a somewhat different pattern of signs for the coefficients on the control variables—notably education and income. However, the coefficients on these variables—both in our analysis and in the original article by Kam and Simas—fall short of statistical significance by a wide margin.

2. Another promise of MTurk is as an inexpensive tool for conducting panel studies. Panel studies offer several potential advantages. For example, recent research in political science on the rate at which treatment effects decay (Chong and Druckman 2010; Gerber, Gimpel, Green, and Shaw 2011) has led to concerns that survey experiments may overstate the effects of manipulations relative to what one would observe over longer periods of time. For this reason, scholars are interested in mechanisms for exposing respondents to experimental manipulations and then measuring treatment effects over the long term. Panels also allow researchers to conduct pretreatment surveys and then administer a treatment distant from that initial measurement (allowing time to serve as a substitute for a distracter task). Another potential use of a panel study is to screen a large population and then to select from that initial pool of respondents a subset who better match desired sample characteristics. The MTurk interface provides a mechanism for performing these sorts of panel studies. To conduct a panel survey, the researcher first fields a task as described above. Next, the researcher posts a new task on the MTurk workspace. We recommend that this task be clearly labeled as open only to prior research participants. Finally, the researcher notifies those workers she wishes to perform the new task of its availability. We have written and tested a customizable Perl script that does just this (see the Supplementary data). In particular, after it is edited to work with the researcher's MTurk account and to describe the new task, it interacts with the Amazon.com API to send messages through the MTurk interface to each invited worker. As with any other task, workers can be directed to an external Web site and asked to submit a code to receive payment. Our initial experiences with using MTurk to perform panel studies are positive. In one study, respondents were offered 25 cents for a 3-min follow-up survey conducted 8 days after a first-wave survey. Two reminders were sent. Within 5 days, 68% of the original respondents took the follow-up. In a second study, respondents were offered 50 cents for a 3-min follow-up survey conducted 1–3 months after a first-wave interview. Within 8 days, almost 60% of the original respondents took the follow-up. Consistent with our findings, Buhrmester, Kwang, and Gosling (2011) report a two-wave panel study, conducted 3 weeks apart, also achieving a 60% response rate. They paid respondents 50 cents for the first wave and 50 cents for the second. Analysis of our two studies suggests that the demographic profile does not change significantly in the follow-up survey. Based on these results, we see no obstacle to oversampling demographic or other groups in follow-up surveys, which could allow researchers to study specific groups or improve the representativeness of samples.

3. Utility data annotation with Amazon Mechanical Turk;Sorokin;Computer Vision and Pattern Recognition Workshops ‘08,2008

4. How Large and Long-lasting Are the Persuasive Effects of Televised Campaign Ads? Results from a Randomized Field Experiment

5. Predicting political elections from rapid and unreflective face judgments

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3