Demographic research with non-representative internet data
Author:
Zagheni Emilio,Weber Ingmar
Abstract
Purpose
– Internet data hold many promises for demographic research, but come with severe drawbacks due to several types of bias. The purpose of this paper is to review the literature that uses internet data for demographic studies and presents a general framework for addressing the problem of selection bias in non-representative samples.
Design/methodology/approach
– The authors propose two main approaches to reduce bias. When ground truth data are available, the authors suggest a method that relies on calibration of the online data against reliable official statistics. When no ground truth data are available, the authors propose a difference in differences approach to evaluate relative trends.
Findings
– The authors offer a generalization of existing techniques. Although there is not a definite answer to the question of whether statistical inference can be made from non-representative samples, the authors show that, when certain assumptions are met, the authors can extract signal from noisy and biased data.
Research limitations/implications
– The methods are sensitive to a number of assumptions. These include some regularities in the way the bias changes across different locations, different demographic groups and between time steps. The assumptions that we discuss might not always hold. In particular, the scenario where bias varies in an unpredictable manner and, at the same time, there is no “ground truth” available to continuously calibrate the model, remains challenging and beyond the scope of this paper.
Originality/value
– The paper combines a critical review of existing substantive and methodological literature with a generalization of prior techniques. It intends to provide a fresh perspective on the issue and to stimulate the methodological discussion among social scientists.
Subject
Management of Technology and Innovation,Organizational Behavior and Human Resource Management,Strategy and Management
Reference36 articles.
1. Alkema, L.
,
Raftery, A.E.
and
Brown, T.
(2008), “Bayesian melding for estimating uncertainty in national HIV prevalence estimates”,
Sexually Transmitted Infections
, Vol. 84 No. 1, pp. i11-i16. 2. Baker, R.
,
Brick, J.
,
Bates, N.
,
Battaglia, M.
,
Couper, M.
,
Denver, J.
,
Gile, K.
and
Tourangeau, R.
(2013a), “Non-probability Sampling”, report of the AAPOR Task Force, American Association for Public Opinion Research, Boston, MA. 3. Baker, R.
,
Brick, J.M.
,
Bates, N.A.
,
Battaglia, M.
,
Couper, M.P.
,
Dever, J.A.
,
Gile, K.J.
and
Tourangeau, R.
(2013b), “Summary Report of the AAPOR Task Force on Non-probability Sampling”,
Journal of Survey Statistics and Methodology
, Vol. 1 No. 2, pp. 90-105. 4. Bayir, M.A.
,
Demirbas, M.
and
Eagle, N.
(2009), “Discovering spatiotemporal mobility profiles of cellphone users”, World of Wireless, Mobile and Multimedia Networks & Workshops, WoWMoM 2009, IEEE International Symposium, IEEE, pp. 1-9. 5. Billari, F.
,
D’Amuri, F.
and
Marcucci, J.
(2013), “Forecasting births using google”, Annual Meeting of the Population Association of America, New Orleans, LA.
Cited by
66 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|