Abstract
Large sample size (N) is seen as a key criterion in judging the replicability of psychological research, a phenomenon we refer to as the N-Heuristic. This heuristic has led to the incentivization of fast, online, non-behavioral studies—to the potential detriment of psychological science. While large N should in principle increase statistical power and thus the replicability of effects, in practice it may not. Large-N studies may have other attributes that undercut their power or validity. Consolidating data from all systematic, large-scale attempts at replication (N = 307 original-replication study pairs), we find that the original study’s sample size did not predict its likelihood of being replicated (rs = -0.02, p = 0.741), even with study design and research area controlled. By contrast, effect size emerged as a substantial predictor (rs = 0.21, p < 0.001), which held regardless of the study’s sample size. N may be a poor predictor of replicability because studies with larger N investigated smaller effects (rs = -0.49, p < 0.001). Contrary to these results, a survey of 215 professional psychologists, presenting them with a comprehensive list of methodological criteria, found sample size to be rated as the most important criterion in judging a study’s replicability. Our findings strike a cautionary note with respect to the prioritization of large N in judging the replicability of psychological science.
Publisher
Public Library of Science (PLoS)