Simplitigs as an efficient and scalable representation of de Bruijn graphs-Reference-Cited by-同舟云学术

Simplitigs as an efficient and scalable representation of de Bruijn graphs

Published:2020-01-12 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Břinda Karel^ORCID,Baym Michael^ORCID,Kucherov Gregory^ORCID

Abstract

AbstractDe Bruijn graphs play an essential role in computational biology. However, despite their widespread use, they lack a universal scalable representation suitable for different types of genomic data sets. Here, we introduce simplitigs as a compact, efficient and scalable representation and present a fast algorithm for their computation. On examples of several model organisms and two bacterial pan-genomes, we show that, compared to the best existing representation, simplitigs provide a substantial improvement in the cumulative sequence length and their number, especially for graphs with many branching nodes. We demonstrate that this improvement is amplified with more data available. Combined with the commonly used Burrows-Wheeler Transform index of genomic sequences, simplitigs substantially reduce both memory and index loading and query times, as illustrated with large-scale examples of GenBank bacterial pan-genomes.

Publisher

Cold Spring Harbor Laboratory

Reference83 articles.

1. A general method applicable to the search for similarities in the amino acid sequence of two proteins

2. Identification of common molecular subsequences

3. An improved algorithm for matching biological sequences

4. A New Algorithm for DNA Sequence Assembly

5. l-Tuple DNA Sequencing: Computer Analysis

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Applications of de Bruijn graphs in microbiome research;iMeta;2022-03

2. Disk compression of k-mer sets;Algorithms for Molecular Biology;2021-06-21

3. Simplitigs as an efficient and scalable representation of de Bruijn graphs;Genome Biology;2021-04-06

4. Representation of k-Mer Sets Using Spectrum-Preserving String Sets;Journal of Computational Biology;2021-04-01

5. Data Structures to Represent a Set of k -long DNA Sequences;ACM Computing Surveys;2021-04