Abstract
AbstractBackgroundStandardizing and documenting computational analyses are necessary to ensure reproducible results. It is especially important for large and complex projects where data collection, analysis, and interpretation may span decades. Our objective is therefore to provide methods, tools, and best practice guidelines adapted for analyses in epidemiological studies that use -omics data.ResultsWe describe an R-based implementation of data management and preprocessing. The method is well-integrated with the analysis tools typically used for statistical analysis of -omics data. We document all datasets thoroughly and use version control to track changes to both datasets and code over time. We provide a web application to perform the standardized preprocessing steps for gene expression datasets. We provide best practices for reporting data analysis results and sharing analyses.ConclusionWe have used these tools to organize data storage and documentation, and to standardize the analysis of gene expression data, in the Norwegian Women and Cancer (NOWAC) system epidemiology study. We believe our approach and lessons learned are applicable to analyses in other large and complex epidemiology projects.
Publisher
Cold Spring Harbor Laboratory
Reference29 articles.
1. Reality check on reproducibility;Nat. News,2016
2. Systems epidemiology in cancer;Cancer Epidemiol. Biomark. Prev. Publ. Am. Assoc. Cancer Res. Cosponsored Am. Soc. Prev. Oncol,2008
3. STrengthening the reporting of OBservational studies in Epidemiology-Molecular Epidemiology (STROBE-ME): an extension of the STROBE statement;Eur. J. Epidemiol,2011
4. Reproducibility in Scientific Computing;ACM Comput Surv,2018
5. P. Amstutz et al., “Common Workflow Language, v1.0.” 08-Jul-2016.