Finding maximal exact matches in graphs

Author:

Rizzo NicolaORCID,Cáceres ManuelORCID,Mäkinen VeliORCID

Abstract

Abstract Background We study the problem of finding maximal exact matches (MEMs) between a query string Q and a labeled graph G. MEMs are an important class of seeds, often used in seed-chain-extend type of practical alignment methods because of their strong connections to classical metrics. A principled way to speed up chaining is to limit the number of MEMs by considering only MEMs of length at least $$\kappa$$ κ ($$\kappa$$ κ -MEMs). However, on arbitrary input graphs, the problem of finding MEMs cannot be solved in truly sub-quadratic time under SETH (Equi et al., TALG 2023) even on acyclic graphs. Results In this paper we show an $$O(n\cdot L \cdot d^{L-1} + m + M_{\kappa ,L})$$ O ( n · L · d L - 1 + m + M κ , L ) -time algorithm finding all $$\kappa$$ κ -MEMs between Q and G spanning exactly L nodes in G, where n is the total length of node labels, d is the maximum degree of a node in G, $$m = |Q|$$ m = | Q | , and $$M_{\kappa ,L}$$ M κ , L is the number of output MEMs. We use this algorithm to develop a $$\kappa$$ κ -MEM finding solution on indexable Elastic Founder Graphs (Equi et al., Algorithmica 2022) running in time $$O(nH^2 + m + M_\kappa )$$ O ( n H 2 + m + M κ ) , where H is the maximum number of nodes in a block, and $$M_\kappa$$ M κ is the total number of $$\kappa$$ κ -MEMs. Our results generalize to the analysis of multiple query strings (MEMs between G and any of the strings). Additionally, we provide some experimental results showing that the number of graph MEMs is an order of magnitude smaller than the number of string MEMs of the corresponding concatenated collection. Conclusions We show that seed-chain-extend type of alignment methods can be implemented on top of indexable Elastic Founder Graphs by providing an efficient way to produce the seeds between a set of queries and the graph. The code is available in https://github.com/algbio/efg-mems.

Funder

H2020 Marie Skłodowska-Curie Actions

Academy of Finland

University of Helsinki

Publisher

Springer Science and Business Media LLC

Reference47 articles.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3