Author:
Almodaresi Fatemeh,Pandey Prashant,Patro Rob
Abstract
AbstractThe colored de Bruijn graph— a variant of the de Bruijn graph which associates each edge (i.e., k-mer) with some set of colors — is an increasingly important combinatorial structure in computational biology. Iqbal et al. demonstrated the utility of this structure for representing and assembling a collection (pop-ulation) of genomes, and showed how it can be used to accurately detect genetic variants. Muggli et al. introduced VARI, a representation of the colored de Bruijn graph that adopts the BOSS representation for the de Bruijn graph topology and achieves considerable savings in space over Cortex, albeit with some sacrifice in speed. The memory-efficient representation of VARI allows the colored de Bruijn graph to be constructed and analyzed for large datasets, beyond what is possible with Cortex.In this paper, we introduce Rainbowfish, a succinct representation of the color information of the colored de Bruijn graph that reduces the space usage even further. Our representation also uses BOSS to represent the de Bruijn graph, but decomposes the color sets based on an equivalence relation and exploits the inherent skewness in the distribution of these color sets. The Rainbowfish representation is compressed based on the 0th-order entropy of the color sets, which can lead to a significant reduction in the space required to store the relevant information for each edge. In practice, Rainbowfish achieves up to a 20 × improvement in space over VARI. Rainbowfish is written in C++11 and is available at https://github.com/COMBINE-lab/rainbowfish.
Publisher
Cold Spring Harbor Laboratory
Reference17 articles.
1. Alexander Bowe , Taku Onodera , Kunihiko Sadakane , and Tetsuo Shibuya . Succinct de Bruijn graphs. In Proceedings of the International Workshop on Algorithms in Bioinformatics, pages225–235. Springer, 2012.
2. Whole genome resequencing in tomato reveals variation associated with introgression and breeding events
3. Simon Gog . Succinct data structure library. https://github.com/simongog/sdsl-lite, 2017. [online; accessed 01-Feb-2017].
4. Rodrigo Gonzalez , Szymon Grabowski , Veli Makinen , and Gonzalo Navarro . Practical implementation of rank and select queries. In Poster Proceedings Volume of 4th Workshop on Efficient and Experimental Algorithms (WEA), pages 27–38, 2005.
5. GENCODE: The reference human genome annotation for The ENCODE Project
Cited by
26 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献