Raiders of the lost HARK: a reproducible inference framework for big data science-Reference-Cited by-同舟云学术

Raiders of the lost HARK: a reproducible inference framework for big data science

Published:2019-10-22 Issue:1 Volume:5 Page:
ISSN:2055-1045
Container-title:Palgrave Communications
language:en
Short-container-title:Palgrave Commun

Author:

Prosperi Mattia^ORCID,Bian Jiang^ORCID,Buchan Iain E.^ORCID,Koopman James S.,Sperrin Matthew,Wang Mo

Abstract

Abstract Hypothesizing after the results are known (HARK) has been disparaged as data dredging, and safeguards including hypothesis preregistration and statistically rigorous oversight have been recommended. Despite potential drawbacks, HARK has deepened thinking about complex causal processes. Some of the HARK precautions can conflict with the modern reality of researchers’ obligations to use big, ‘organic’ data sources—from high-throughput genomics to social media streams. We here propose a HARK-solid, reproducible inference framework suitable for big data, based on models that represent formalization of hypotheses. Reproducibility is attained by employing two levels of model validation: internal (relative to data collated around hypotheses) and external (independent to the hypotheses used to generate data or to the data used to generate hypotheses). With a model-centered paradigm, the reproducibility focus changes from the ability of others to reproduce both data and specific inferences from a study to the ability to evaluate models as representation of reality. Validation underpins ‘natural selection’ in a knowledge base maintained by the scientific community. The community itself is thereby supported to be more productive in generating and critically evaluating theories that integrate wider, complex systems.

Publisher

Springer Science and Business Media LLC

Subject

General Economics, Econometrics and Finance,General Psychology,General Social Sciences,General Arts and Humanities

Link

http://www.nature.com/articles/s41599-019-0340-8.pdf

Reference90 articles.

1. van Aert RCM, Wicherts JM, van Assen MALM (2016) Conducting meta-analyses based on p values: Reservations and recommendations for applying p -uniform and p -curve. Perspect Psychological Sci 11(5):713–729. https://doi.org/10.1177/1745691616650874

2. Allen CPG, Mehler DMA (2018) Open science challenges, benefits and tips in early career and beyond. PLoS Biol https://doi.org/10.31234/osf.io/3czyt .

3. Amrhein V, Korner-Nievergelt F, Roth T (2017) The earth is flat (p > 0.05): significance thresholds and the crisis of unreplicable research. PeerJ 5:e3544. https://doi.org/10.7717/peerj.3544

4. Arango C (2017) Candidate gene associations studies in psychiatry: time to move forward. Eur Arch Psychiatry Clin Neurosci 267(1):1–2. https://doi.org/10.1007/s00406-016-0765-7

5. Baker M (2016) 1,500 scientists lift the lid on reproducibility. Nature 533(7604):452–454. https://doi.org/10.1038/533452a

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A framework for enhancing the replicability of behavioral MIS research using prediction oriented techniques;International Journal of Information Management;2024-10

2. The benefits and pitfalls of machine learning for biomarker discovery;Cell and Tissue Research;2023-07-27

3. Open science, closed doors: The perils and potential of open science for research in practice;Industrial and Organizational Psychology;2022-12

4. Evidence for HARKing in mouse behavioural tests of anxiety;2022-12-01

5. Severe testing with high-dimensional omics data for enhancing biomedical scientific discovery;npj Systems Biology and Applications;2022-10-21