Flowtigs: safety in flow decompositions for assembly graphs

Author:

Sena FranciscoORCID,Ingervo ElielORCID,Khan ShahbazORCID,Prjibelski AndreyORCID,Schmidt SebastianORCID,Tomescu Alexandru I.ORCID

Abstract

AbstractAdecompositionof a network flow is a set of weighted paths whose superposition equals the flow. The problem of characterising and computing safe walks for flow decompositions has so far seen only a partial solution by restricting the flow decomposition to consist of paths, and the graph to be directed and acyclic (DAG). However, the problem of decomposing into closed walks in a general graph (allowing cycles) is still open.In this paper, we give a simple and linear-time-verifiable complete characterisation (flowtigs) of walks that aresafein such general flow decompositions, i.e. that are subwalks of any possible flow decomposition. Our characterisation generalises over the previous one for DAGs, using a more involved proof of correctness that works around various issues introduced by cycles. We additionally provide an optimalO(mn)-time algorithm that identifies all maximal flowtigs and represents them inside a compact structure. We also implement this algorithm and show that it is very fast in practice.On the practical side, we study flowtigs in the use-case of metagenomic assembly. By using the abundances of the metagenomic assembly graph as flow values, we can model the possible assembly solutions as flow decompositions into closed walks. Compared to reporting unitigs or maximal safe walks based only on the graph structure (structural contigs), reporting flowtigs results in a notably more contiguous assembly. Specifically, on shorter contigs (75-percentile), we get an improvement in assembly contiguity of up to 100% over unitigs, and up to 61.9% over structural contigs. For the 50-percentile of contiguity we get an improvement of up to 17.0% over unitigs and up to 14.6% over structural contigs. These improvements are more pronounced the more complex the assembly graphs are, and the improvements of flowtigs over unitigs are multiple times larger compared to the improvements of previous safe walks over unitigs.

Publisher

Cold Spring Harbor Laboratory

Reference50 articles.

1. Ravindra K. Ahuja , Thomas L. Magnanti , and James B. Orlin . Network Flows: Theory, Algorithms, and Applications. USA: Prentice-Hall, Inc., 1993. ISBN: 013617549X.

2. New approaches for metagenome assembly with short reads

3. Efficient High-Quality Metagenome Assembly from Long Accurate Reads using Minimizer-space de Bruijn Graphs

4. Safety in multi-assembly via paths appearing in all path covers of a DAG

5. Massimo Cairo , Shahbaz Khan , Romeo Rizzi , Sebastian Schmidt , Alexandru I Tomescu , and Elia C Zirondelli . “Cut Paths and Their Remainder Structure, with Applications”. In: 40th International Symposium on Theoretical Aspects of Computer Science (STACS 2023). Schloss Dagstuhl-Leibniz-Zentrum für Informatik. 2023.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3