Affiliation:
1. Department of Political Science and Geography, Old Dominion University, BAL 7000, Norfolk, VA 23529, USA
2. Department of Public Service, Gardner-Webb University, Boiling Springs, NC 28017, USA
Abstract
Big search data offers the opportunity to identify new and potentially real-time measures and predictors of important political, geographic, social, cultural, economic, and epidemiological phenomena, measures that might serve an important role as leading indicators in forecasts and nowcasts. However, it also presents vast new risks that scientists or the public will identify meaningless and totally spurious ‘relationships’ between variables. This study is the first to quantify that risk in the context of search data. We find that spurious correlations arise at exceptionally high frequencies among probability distributions examined for random variables based upon gamma (1, 1) and Gaussian random walk distributions. Quantifying these spurious correlations and their likely magnitude for various distributions has value for several reasons. First, analysts can make progress toward accurate inference. Second, they can avoid unwarranted credulity. Third, they can demand appropriate disclosure from the study authors.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献