Distance indexing and seed clustering in sequence graphs-Reference-Cited by-同舟云学术

Distance indexing and seed clustering in sequence graphs

Published:2020-07-01 Issue:Supplement_1 Volume:36 Page:i146-i153
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Chang Xian¹,Eizenga Jordan¹,Novak Adam M¹,Sirén Jouni¹,Paten Benedict¹

Affiliation:

1. Department of Biomolecular Engineering, University of California Santa Cruz Genomics Institute, Santa Cruz, CA 95060, USA

Abstract

Abstract Motivation Graph representations of genomes are capable of expressing more genetic variation and can therefore better represent a population than standard linear genomes. However, due to the greater complexity of genome graphs relative to linear genomes, some functions that are trivial on linear genomes become much more difficult in genome graphs. Calculating distance is one such function that is simple in a linear genome but complicated in a graph context. In read mapping algorithms such distance calculations are fundamental to determining if seed alignments could belong to the same mapping. Results We have developed an algorithm for quickly calculating the minimum distance between positions on a sequence graph using a minimum distance index. We have also developed an algorithm that uses the distance index to cluster seeds on a graph. We demonstrate that our implementations of these algorithms are efficient and practical to use for a new generation of mapping algorithms based upon genome graphs. Availability and implementation Our algorithms have been implemented as part of the vg toolkit and are available at https://github.com/vgteam/vg.

Funder

National Institutes of Health

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Link

http://academic.oup.com/bioinformatics/article-pdf/36/Supplement_1/i146/33488679/btaa446.pdf

Reference20 articles.

1. TopCom: Index for Shortest Distance Query in Directed Graph

2. A note on two problems in connexion with graphs;Dijkstra;Numer. Math,1959

3. Efficient algorithms for shortest path queries in planar digraphs

4. Variation graph toolkit improves read mapping by representing genetic variation in the reference;Garrison;Nat. Biotechnol,2018

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Personalized pangenome references;Nature Methods;2024-09-11

2. DNA sequences alignment method using sparse index on pan-genome graph;Journal of Bioinformatics and Computational Biology;2024-08-31

3. Maximum-scoring path sets on pangenome graphs of constant treewidth;Frontiers in Bioinformatics;2024-07-01

4. Label-guided seed-chain-extend alignment on annotated De Bruijn graphs;Bioinformatics;2024-06-28

5. Harp: Leveraging Quasi-Sequential Characteristics to Accelerate Sequence-to-Graph Mapping of Long Reads;Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3;2024-04-27