Abstract
AbstractNext Generation Sequencing (NGS) workloads largely consist of pipelines of tasks with heterogeneous compute, memory, and storage requirements. Identifying the optimal system configuration has historically required expertise in both system architecture and bioinformatics. This paper outlines infrastructure recommendations for one commonly used genomics workload based on extensive benchmarking and profiling, along with recommendations on how to tune genomics workflows for high performance computing (HPC) infrastructure. The demonstrated methodology and learnings can be extended for other genomics workloads and for other infrastructures such as the cloud.
Publisher
Cold Spring Harbor Laboratory
Reference13 articles.
1. Wetterstrand KA . DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP) [Internet]. [cited 2021 Sept 30]. Available from: www.genome.gov/sequencingcostsdata
2. Ancient DNA and the rewriting of human history: be sparing with Occam’s razor
3. Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow
4. Van der Auwera GA , O’Connor BD . Genomics in the Cloud: Using Docker, GATK, and WDL in Terra. O’Reilly Media; 2020 Apr 2.
5. Parallelism - Multithreading - Scatter Gather [Internet]. Broad Institute; 2021 June 6 [cited 2021 Aug 26]. Available from: https://gatk.broadinstitute.org/hc/en-us/articles/360035532012-Parallelism-Multithreading-Scatter-Gather