Sustainable data analysis with Snakemake-Reference-Cited by-同舟云学术

Sustainable data analysis with Snakemake

Published:2021-01-18 Issue: Volume:10 Page:33
ISSN:2046-1402
Container-title:F1000Research
language:en
Short-container-title:F1000Res

Author:

Mölder Felix^ORCID,Jablonski Kim Philipp^ORCID,Letcher Brice^ORCID,Hall Michael B.^ORCID,Tomkins-Tinch Christopher H.^ORCID,Sochat Vanessa^ORCID,Forster Jan,Lee Soohyun^ORCID,Twardziok Sven O.,Kanitz Alexander^ORCID,Wilm Andreas,Holtgrewe Manuel,Rahmann Sven,Nahnsen Sven,Köster Johannes^ORCID

Abstract

Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.

Funder

Deutsche Stiftung für Herzforschung

Netherlands Organisation for Scientific Research

Google LLC

United States National Science Foundation Graduate Research Fellowship Program

Publisher

F1000 Research Ltd

Subject

General Pharmacology, Toxicology and Pharmaceutics,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine

Link

https://f1000research.com/articles/10-33/v1/pdf

Reference38 articles.

1. 1,500 scientists lift the lid on reproducibility.;M Baker;Nature.,2016

2. Computer science. Accessible reproducible research.;J Mesirov;Science.,2010

3. A manifesto for reproducible science.;M Munafò;Nat Hum Behav.,2017

4. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update.;E Afgan;Nucleic Acids Res.,2018

5. KNIME: The Konstanz Information Miner.;M Berthold,2007

Cited by 611 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. NCBench: providing an open, reproducible, transparent, adaptable, and continuous benchmark approach for DNA-sequencing-based variant calling;F1000Research;2024-09-12

2. detectEVE: fast, sensitive and precise detection of endogenous viral elements in genomic data;2024-09-08

3. Identification of transcription factor co-binding patterns with non-negative matrix factorization;Nucleic Acids Research;2024-09-01

4. The potential of human leukocyte antigen alleles to assist with multiple-contributor DNA mixtures: Proof of concept study;Science & Justice;2024-09

5. Offshore power and hydrogen networks for Europe’s North Sea;Applied Energy;2024-09