Author:
Alipanahi Bahar,Kuhnle Alan,Puglisi Simon J.,Salmela Leena,Boucher Christina
Abstract
AbstractMotivationThe de Bruijn graph is one of the fundamental data structures for analysis of high throughput sequencing data. In order to be applicable to population-scale studies, it is essential to build and store the graph in a space- and time-efficient manner. In addition, due to the ever-changing nature of population studies, it has become essential to update the graph after construction e.g. add and remove nodes and edges. Although there has been substantial effort on making the construction and storage of the graph efficient, there is a limited amount of work in building the graph in an efficient and mutable manner. Hence, most space efficient data structures require complete reconstruction of the graph in order to add or remove edges or nodes.ResultsIn this paper we present DynamicBOSS, a succinct representation of the de Bruijn graph that allows for an unlimited number of additions and deletions of nodes and edges. We compare our method with other competing methods and demonstrate that DynamicBOSS is the only method that supports both addition and deletion and is applicable to very large samples (e.g. greater than 15 billion k-mers). Competing dynamic methods e.g., FDBG (Crawford et al., 2018) cannot be constructed on large scale datasets, or cannot support both addition and deletion e.g., BiFrost (Holley and Melsted, 2019).AvailabilityDynamicBOSS is publicly available at https://github.com/baharpan/dynboss.Contactbaharpan@ufl.edu
Publisher
Cold Spring Harbor Laboratory
Reference28 articles.
1. Rainbowfish: A succinct colored de Bruijn graph representation;In: Leibniz International Proceedings in Informatics (LIPIcs),2017
2. Bowe, A. , Onodera, T. , Sadakane, K. , and Shibuya, T. (2012). Succinct de Bruijn graphs. In International Workshop on Algorithms in Bioinformatics (WABI), pages 225–235. Springer.
3. Burrows, M. and Wheeler, D. (1994). A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation, Palo Alto, California.
4. Chikhi, R. and Rizk, G. (2013). Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms for Molecular Biology., 8(22).
5. Succinct data structures for assembling large genomes
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献