Abstract
AbstractAdecompositionof a network flow is a set of weighted paths whose superposition equals the flow. The problem of characterising and computing safe walks for flow decompositions has so far seen only a partial solution by restricting the flow decomposition to consist of paths, and the graph to be directed and acyclic (DAG). However, the problem of decomposing into closed walks in a general graph (allowing cycles) is still open.In this paper, we give a simple and linear-time-verifiable complete characterisation (flowtigs) of walks that aresafein such general flow decompositions, i.e. that are subwalks of any possible flow decomposition. Our characterisation generalises over the previous one for DAGs, using a more involved proof of correctness that works around various issues introduced by cycles. We additionally provide an optimalO(mn)-time algorithm that identifies all maximal flowtigs and represents them inside a compact structure. We also implement this algorithm and show that it is very fast in practice.On the practical side, we study flowtigs in the use-case of metagenomic assembly. By using the abundances of the metagenomic assembly graph as flow values, we can model the possible assembly solutions as flow decompositions into closed walks. Compared to reporting unitigs or maximal safe walks based only on the graph structure (structural contigs), reporting flowtigs results in a notably more contiguous assembly. Specifically, on shorter contigs (75-percentile), we get an improvement in assembly contiguity of up to 100% over unitigs, and up to 61.9% over structural contigs. For the 50-percentile of contiguity we get an improvement of up to 17.0% over unitigs and up to 14.6% over structural contigs. These improvements are more pronounced the more complex the assembly graphs are, and the improvements of flowtigs over unitigs are multiple times larger compared to the improvements of previous safe walks over unitigs.
Publisher
Cold Spring Harbor Laboratory
Reference50 articles.
1. Ravindra K. Ahuja , Thomas L. Magnanti , and James B. Orlin . Network Flows: Theory, Algorithms, and Applications. USA: Prentice-Hall, Inc., 1993. ISBN: 013617549X.
2. New approaches for metagenome assembly with short reads
3. Efficient High-Quality Metagenome Assembly from Long Accurate Reads using Minimizer-space de Bruijn Graphs
4. Safety in multi-assembly via paths appearing in all path covers of a DAG
5. Massimo Cairo , Shahbaz Khan , Romeo Rizzi , Sebastian Schmidt , Alexandru I Tomescu , and Elia C Zirondelli . “Cut Paths and Their Remainder Structure, with Applications”. In: 40th International Symposium on Theoretical Aspects of Computer Science (STACS 2023). Schloss Dagstuhl-Leibniz-Zentrum für Informatik. 2023.