Abstract
AbstractCompacted de Bruijn graphs are one of the most fundamental data structures in computational genomics. Colored compacted graphs Bruijn graphs are a variant built on acollectionof sequences, and associate to eachk-mer the sequences in which it appears. We present GGCAT, a tool for constructing both types of graphs, based on a new approach merging thek-mer counting step with the unitig construction step, and on numerous practical optimizations.For compacted de Bruijn graph construction, GGCAT achieves speed-ups of 3–21× compared to the state-of-the-art tool Cuttlefish 2 (Khan and Patro, Genome Biology, 2022). When constructing the colored variant, GGCAT achieves speed-ups of 5–39× compared to the state-of-the-art tool BiFrost (Holley and Melsted, Genome Biology, 2020). Additionally, GGCAT is up to 480× faster than BiFrost for batch sequence queries on colored graphs.
Publisher
Cold Spring Harbor Laboratory
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献