Abstract
ObjectivesPrevious research has shown clear biases in the distribution of published p values, with an excess below the 0.05 threshold due to a combination of p-hacking and publication bias. We aimed to examine the bias for statistical significance using published confidence intervals.DesignObservational study.SettingPapers published inMedlinesince 1976.ParticipantsOver 968 000 confidence intervals extracted from abstracts and over 350 000 intervals extracted from the full-text.Outcome measuresCumulative distributions of lower and upper confidence interval limits for ratio estimates.ResultsWe found an excess of statistically significant results with a glut of lower intervals just above one and upper intervals just below 1. These excesses have not improved in recent years. The excesses did not appear in a set of over 100 000 confidence intervals that were not subject to p-hacking or publication bias.ConclusionsThe huge excesses of published confidence intervals that are just below the statistically significant threshold are not statistically plausible. Large improvements in research practice are needed to provide more results that better reflect the truth.
Funder
National Health and Medical Research Council
Reference18 articles.
1. Emerson GB , Warme WJ , Wolf FM , et al . Testing for the presence of positive-outcome bias in peer review. Arch Intern Med 2010;170.doi:10.1001/archinternmed.2010.406
2. Ziliak S , McCloskey DN . The cult of statistical significance: how the standard error costs us jobs, justice, and lives. University of Michigan Press, 2008.
3. p-Curve and Effect Size
4. The Extent and Consequences of P-Hacking in Science
5. Gelman A , Loken E . The garden of forking paths: Why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time, 2013. Available: http://www.stat.columbia.edu/~gelman/research/unpublished/p_hacking.pdf
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献