Maximum-scoring path sets on pangenome graphs of constant treewidth
-
Published:2024-07-01
Issue:
Volume:4
Page:
-
ISSN:2673-7647
-
Container-title:Frontiers in Bioinformatics
-
language:
-
Short-container-title:Front. Bioinform.
Author:
Brejová Broňa,Gagie Travis,Herencsárová Eva,Vinař Tomáš
Abstract
We generalize a problem of finding maximum-scoring segment sets, previously studied by Csűrös (IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2004, 1, 139–150), from sequences to graphs. Namely, given a vertex-weighted graph G and a non-negative startup penalty c, we can find a set of vertex-disjoint paths in G with maximum total score when each path’s score is its vertices’ total weight minus c. We call this new problem maximum-scoring path sets (MSPS). We present an algorithm that has a linear-time complexity for graphs with a constant treewidth. Generalization from sequences to graphs allows the algorithm to be used on pangenome graphs representing several related genomes and can be seen as a common abstraction for several biological problems on pangenomes, including searching for CpG islands, ChIP-seq data analysis, analysis of region enrichment for functional elements, or simple chaining problems.
Funder
Agentúra Ministerstva Školstva, Vedy, Výskumu a Športu SR
Agentúra na Podporu Výskumu a Vývoja
National Human Genome Research Institute
Natural Sciences and Engineering Research Council of Canada
HORIZON EUROPE Marie Sklodowska-Curie Actions
Publisher
Frontiers Media SA
Reference41 articles.
1. Complexity of finding embeddings in a k-tree;Arnborg;SIAM J. Algebraic Discrete Methods,1987
2. Easy problems for tree-decomposable graphs;Arnborg;J. Algorithms,1991
3. A linear time algorithm for finding tree-decompositions of small treewidth;Bodlaender;SIAM J. Comput.,1996
4. Treewidth: algorithmic techniques and results;Bodlaender,1997