SnakeLines: integrated set of computational pipelines for sequencing reads
Author:
Budiš Jaroslav123, Krampl Werner134, Kucharík Marcel13, Hekel Rastislav124, Goga Adrián35, Sitarčík Jozef123, Lichvár Michal13, Smol’ak Dávid14, Böhmer Miroslav134, Baláž Andrej16, Ďuriš František12, Gazdarica Juraj12, Šoltys Katarína34, Turňa Ján234, Radvánszky Ján137, Szemes Tomáš134
Affiliation:
1. Geneton Ltd. , 841 04 Bratislava , Slovakia 2. Slovak Centre of Scientific and Technical Information , 811 04 Bratislava , Slovakia 3. Comenius University Science Park , 841 04 Bratislava , Slovakia 4. Department of Molecular Biology, Faculty of Natural Sciences , Comenius University , 841 04 Bratislava , Slovakia 5. Department of Computer Science, Faculty of Mathematics, Physics and Informatics , Comenius University , 841 04 Bratislava , Slovakia 6. Department of Applied Informatics, Faculty of Mathematics, Physics and Informatics , Comenius University , 841 04 Bratislava , Slovakia 7. Institute of Clinical and Translational Research, Biomedical Research Center, Slovak Academy of Sciences , 845 05 Bratislava , Slovakia
Abstract
Abstract
With the rapid growth of massively parallel sequencing technologies, still more laboratories are utilising sequenced DNA fragments for genomic analyses. Interpretation of sequencing data is, however, strongly dependent on bioinformatics processing, which is often too demanding for clinicians and researchers without a computational background. Another problem represents the reproducibility of computational analyses across separated computational centres with inconsistent versions of installed libraries and bioinformatics tools. We propose an easily extensible set of computational pipelines, called SnakeLines, for processing sequencing reads; including mapping, assembly, variant calling, viral identification, transcriptomics, and metagenomics analysis. Individual steps of an analysis, along with methods and their parameters can be readily modified in a single configuration file. Provided pipelines are embedded in virtual environments that ensure isolation of required resources from the host operating system, rapid deployment, and reproducibility of analysis across different Unix-based platforms. SnakeLines is a powerful framework for the automation of bioinformatics analyses, with emphasis on a simple set-up, modifications, extensibility, and reproducibility. The framework is already routinely used in various research projects and their applications, especially in the Slovak national surveillance of SARS-CoV-2.
Funder
Operational program Integrated Infrastructure co-financed by the European Regional Development Fund Agentúra na Podporu Výskumu a Vývoja
Publisher
Walter de Gruyter GmbH
Reference71 articles.
1. Munafò, MR, Nosek, BA, Bishop, DVM, Button, KS, Chambers, CD, du Sert, NP, et al.. A manifesto for reproducible science. Nat Human Behav 2017;1:0021. https://doi.org/10.1038/s41562-016-0021. 2. Leipzig, J. A review of bioinformatic pipeline frameworks. Briefings Bioinf 2017;18:530–6. https://doi.org/10.1093/bib/bbw020. 3. Afgan, E, Baker, D, Batut, B, van den Beek, M, Bouvier, D, Cech, M, et al.. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res 2018;46:W537–44. https://doi.org/10.1093/nar/gky379. 4. Wolstencroft, K, Haines, R, Fellows, D, Williams, A, Withers, D, Owen, S, et al.. The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud. Nucleic Acids Res 2013;41:W557–61. https://doi.org/10.1093/nar/gkt328. 5. Cingolani, P, Sladek, R, Blanchette, M. BigDataScript: a scripting language for data pipelines. Bioinformatics 2015;31:10–6. https://doi.org/10.1093/bioinformatics/btu595.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|