CGAT-core: a python framework for building scalable, reproducible computational biology workflows-Reference-Cited by-同舟云学术

CGAT-core: a python framework for building scalable, reproducible computational biology workflows

Published:2019-07-16 Issue: Volume:8 Page:377
ISSN:2046-1402
Container-title:F1000Research
language:en
Short-container-title:F1000Res

Author:

Cribbs Adam P.^ORCID,Luna-Valero Sebastian,George Charlotte,Sudbery Ian M.^ORCID,Berlanga-Taylor Antonio J.,Sansom Stephen N.,Smith Tom,Ilott Nicholas E.,Johnson Jethro,Scaber Jakub^ORCID,Brown Katherine,Sims David,Heger Andreas

Abstract

In the genomics era computational biologists regularly need to process, analyse and integrate large and complex biomedical datasets. Analysis inevitably involves multiple dependent steps, resulting in complex pipelines or workflows, often with several branches. Large data volumes mean that processing needs to be quick and efficient and scientific rigour requires that analysis be consistent and fully reproducible. We have developed CGAT-core, a python package for the rapid construction of complex computational workflows. CGAT-core seamlessly handles parallelisation across high performance computing clusters, integration of Conda environments, full parameterisation, database integration and logging. To illustrate our workflow framework, we present a pipeline for the analysis of RNAseq data using pseudo-alignment.

Funder

Medical Research Council

Publisher

F1000 Research Ltd

Subject

General Pharmacology, Toxicology and Pharmaceutics,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine

Link

https://f1000research.com/articles/8-377/v2/pdf

Reference15 articles.

1. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2016 update.;E Afgan;Nucleic Acids Res.,2016

2. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud.;K Wolstencroft;Nucleic Acids Res.,2013

3. Unipro UGENE: a unified bioinformatics toolkit.;K Okonechnikov;Bioinformatics.,2012

4. Unipro UGENE NGS pipelines and components for variant calling, RNA-seq and ChIP-seq data analyses.;O Golosova;PeerJ.,2014

5. Harnessing virtual machines to simplify next-generation DNA sequencing analysis.;J Nocq;Bioinformatics.,2013

Cited by 29 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Panpipes: a pipeline for multiomic single-cell and spatial transcriptomic data analysis;Genome Biology;2024-07-08

2. Cellular and axonal transport phenotypes due to the C9ORF72 HRE in iPSC motor and sensory neurons;Stem Cell Reports;2024-07

3. Benchmarking tRNA-Seq quantification approaches by realistic tRNA-Seq data simulation identifies two novel approaches with higher accuracy;2024-06-21

4. Benchmarking tRNA-Seq quantification approaches by realistic tRNA-Seq data simulation identifies two novel approaches with higher accuracy;2024-06-21

5. Correcting PCR amplification errors in unique molecular identifiers to generate accurate numbers of sequencing molecules;Nature Methods;2024-02-05