Abstract
AbstractThe de Bruijn graph is a key data structure in modern computational genomics, and construction of its compacted variant resides upstream of many genomic analyses. As the quantity of genomic data grows rapidly, this often forms a computational bottleneck. We present Cuttlefish 2, significantly advancing the state-of-the-art for this problem. On a commodity server, it reduces the graph construction time for 661K bacterial genomes, of size 2.58Tbp, from 4.5 days to 17–23 h; and it constructs the graph for 1.52Tbp white spruce reads in approximately 10 h, while the closest competitor requires 54–58 h, using considerably more memory.
Funder
National Institutes of Health
National Science Foundation
Narodowym Centrum Nauki
Publisher
Springer Science and Business Media LLC
Reference88 articles.
1. US National Library of Medicine. NCBI insights: the entire corpus of the sequence read archive (SRA) now live on two cloud platforms! Natl Cent Biotechnol Inf. 2020. https://ncbiinsights.ncbi.nlm.nih.gov/2020/02/24/sra-cloud/. Accessed 8 Nov 2021.
2. Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, Iyer R, Schatz MC, Sinha S, Robinson GE. Big data: Astronomical or genomical?PLoS Biol. 2015; 13(7):1–11. https://doi.org/10.1371/journal.pbio.1002195.
3. Nayfach S, Roux S, Seshadri R, Udwary D, Varghese N, Schulz F, Wu D, Paez-Espino D, Chen I-M, Huntemann M, Palaniappan K, Ladau J, et al.A genomic catalog of earth’s microbiomes. Nat Biotechnol. 2021; 39(4):499–509. https://doi.org/10.1038/s41587-020-0718-6.
4. Gevers D, Knight R, Petrosino JF, Huang K, McGuire AL, Birren BW, Nelson KE, White O, Methé BA, Huttenhower C. The human microbiome project: A community resource for the healthy human microbiome. PLOS Biol. 2012; 10(8):1–5. https://doi.org/10.1371/journal.pbio.1001377.
5. Muir P, Li S, Lou S, Wang D, Spakowicz DJ, Salichos L, Zhang J, Weinstock GM, Isaacs F, Rozowsky J, et al. The real cost of sequencing: scaling computation to keep pace with data generation. Genome Biol. 2016; 17(1):1–9.
Cited by
18 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献