Abstract
Genomic data analysis across multiple cloud platforms is an ongoing challenge, especially when large amounts of data are involved. Here, we present Swarm, a framework for federated computation that promotes minimal data motion and facilitates crosstalk between genomic datasets stored on various cloud platforms. We demonstrate its utility via common inquiries of genomic variants across BigQuery in the Google Cloud Platform (GCP), Athena in the Amazon Web Services (AWS), Apache Presto and MySQL. Compared to single-cloud platforms, the Swarm framework significantly reduced computational costs, run-time delays and risks of security breach and privacy violation.
Funder
Veterans Affairs Office of Research and Development Cooperative Studies Program
National Institutes of Health
Schmidt Futures program
Publisher
Public Library of Science (PLoS)
Subject
Computational Theory and Mathematics,Cellular and Molecular Neuroscience,Genetics,Molecular Biology,Ecology,Modelling and Simulation,Ecology, Evolution, Behavior and Systematics
Reference22 articles.
1. On the future of genomic data;SD Kahn;Science,2011
2. Cloud computing for genomic data analysis and collaboration;B Langmead;Nature Reviews Genetics,2018
3. Bahmani A, Sibley A, Parsian M, Owzar K, Mueller F. SparkScore: Leveraging Apache Spark for Distributed Genomic Inference. IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Chicago, IL, USA. 2016;435–442.
4. Cloud computing for comparative genomics;DP Wall;BMC Bioinformatics,2010
5. Cloud-based interactive analytics for terabytes of genomic variants data;C Pan;Bioinformatics,2017
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献