Abstract
AbstractIn the field of genomics, bioinformatics pipelines play a crucial role in processing and analyzing vast biological datasets. These pipelines, consisting of interconnected tasks, can be optimized for efficiency and scalability by leveraging cloud platforms such as Microsoft Azure. The choice of compute resources introduces a trade-off between cost and time. This paper introduces an approach that uses Linear Programming (LP) to optimize pipeline execution. We consider optimizing two competing cases: minimizing cost with a run duration restriction and minimizing duration with a cost restriction. Our results showcase the utility of using LP in guiding researchers to make informed compute decisions based on specific data sets, cost and time requirements, and resource constraints.
Publisher
Cold Spring Harbor Laboratory
Reference27 articles.
1. Nextflow enables reproducible computational workflows
2. Nextflow documentation: Running on azure, https://www.nextflow.io/docs/edge/azure.html, [Online; accessed: January 25, 2024].
3. Cromwell documentation: Azure backend, https://cromwell.readthedocs.io/en/stable/backends/Azure, [Online; accessed: January 25, 2024].
4. Sustainable data analysis with Snakemake