Metagenomic binning with assembly graph embeddings

Author:

Lamurias Andre1ORCID,Sereika Mantas2,Albertsen Mads2ORCID,Hose Katja1,Nielsen Thomas Dyhre1

Affiliation:

1. Department of Computer Science, Aalborg University , 9000 Aalborg, Denmark

2. Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University , 9000 Aalborg, Denmark

Abstract

Abstract Motivation Despite recent advancements in sequencing technologies and assembly methods, obtaining high-quality microbial genomes from metagenomic samples is still not a trivial task. Current metagenomic binners do not take full advantage of assembly graphs and are not optimized for long-read assemblies. Deep graph learning algorithms have been proposed in other fields to deal with complex graph data structures. The graph structure generated during the assembly process could be integrated with contig features to obtain better bins with deep learning. Results We propose GraphMB, which uses graph neural networks to incorporate the assembly graph into the binning process. We test GraphMB on long-read datasets of different complexities, and compare the performance with other binners in terms of the number of High Quality (HQ) genome bins obtained. With our approach, we were able to obtain unique bins on all real datasets, and obtain more bins on most datasets. In particular, we obtained on average 17.5% more HQ bins when compared with state-of-the-art binners and 13.7% when aggregating the results of our binner with the others. These results indicate that a deep learning model can integrate contig-specific and graph-structure information to improve metagenomic binning. Availability and implementation GraphMB is available from https://github.com/MicrobialDarkMatter/GraphMB. Supplementary information Supplementary data are available at Bioinformatics online.

Funder

VILLUM FONDEN

Poul Due Jensen Foundation

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Reference39 articles.

1. Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes;Albertsen;Nat. Biotechnol,2013

2. Binning metagenomic contigs by coverage and composition;Alneberg;Nat. Methods,2014

3. A systematic survey of regional multi-taxon biodiversity: evaluating strategies and coverage;Brunbjerg;BMC Ecol,2019

4. Over-and under-representation of short oligonucleotides in DNA sequences;Burge;Proc. Natl. Acad. Sci. USA,1992

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3