Author:
Rautiainen Mikko,Mäkinen Veli,Marschall Tobias
Abstract
Graphs are commonly used to represent sets of sequences. Either edges or nodes can be labeled by sequences, so that each path in the graph spells a concatenated sequence. Examples include graphs to represent genome assemblies, such as string graphs and de Bruijn graphs, and graphs to represent a pan-genome and hence the genetic variation present in a population. Being able to align sequencing reads to such graphs is a key step for many analyses and its applications include genome assembly, read error correction, and variant calling with respect to a variation graph. Here, we generalize two linear sequence-to-sequence algorithms to graphs: the Shift-And algorithm for exact matching and Myers’ bitvector algorithm for semi-global alignment. These linear algorithms are both based on processing w sequence characters with a constant number of operations, where w is the word size of the machine (commonly 64), and achieve a speedup of w over naive algorithms. Our bitvector-based graph alignment algorithm reaches a worst case runtime of for acyclic graphs and O(V + mE log w) for arbitrary cyclic graphs. We apply it to four different types of graphs and observe a speedup between 3.1-fold and 10.1-fold compared to previous algorithms.
Publisher
Cold Spring Harbor Laboratory
Reference23 articles.
1. hybridSPAdes: an algorithm for hybrid assembly of short and long reads
2. A new approach to text searching;Commun. ACM,1992
3. Baeza-Yates, R. , Navarro, G. : A faster algorithm for approximate string matching. In: Hirschberg, D. , Myers, G. (eds.) Combinatorial Pattern Matching. pp. 1–23. Springer Berlin Heidelberg, Berlin, Heidelberg (1996)
4. How to apply de Bruijn graphs to genome assembly
5. Computational Pan-Genomics Consortium: Computational pan-genomics: status, promises and challenges;Brief. Bioinform,2018
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献