Abstract
AbstractIn many bioinformatics applications the task is to identify biologically significant locations in an individual genome. In our work, we are interested in finding high-density clusters of such biologically meaningful locations in a graph representation of a pangenome, which is a collection of related genomes. Different formulations of finding such clusters were previously studied for sequences. In this work, we study an extension of this problem for graphs, which we formalize as finding a set of vertex-disjoint paths with a maximum score in a weighted directed graph. We provide a linear-time algorithm for a special class of graphs corresponding to elastic-degenerate strings, one of pangenome representations. We also provide a fixed-parameter tractable algorithm for directed acyclic graphs with a special path decomposition of a limited width.
Publisher
Cold Spring Harbor Laboratory
Reference32 articles.
1. Computational pan-genomics: status, promises and challenges
2. Pangenome Graphs
3. Computational graph pangenomics: a tutorial on data structures and their applications;Natural Computing,2022
4. Maximum-scoring segment sets;IEEE/ACM Transactions on Computational Biology and Bioinformatics,2004
5. C.-T. Wu , J. C. Dunlap , Homology Effects: Volume 46 - Advances in Genetics, Elsevier Science Publishing Co Inc, 2002.