Reference-free structural variant detection in microbiomes via long-read co-assembly graphs

Author:

Curry Kristen D12,Yu Feiqiao Brian3ORCID,Vance Summer E4,Segarra Santiago5,Bhaya Devaki6,Chikhi Rayan7,Rocha Eduardo P C2,Treangen Todd J1

Affiliation:

1. Department of Computer Science, Rice University , 6100 Main St. , Houston, TX 77005, United States

2. Department of Genomes and Genetics, Microbial Evolutionary Genomics, Institut Pasteur, Université Paris Cité, CNRS, UMR3525 , Paris 75015, France

3. Arc Institute , Palo Alto, CA 94304, United States

4. Department of Environmental Science, Policy, and Management, University of California , Berkeley, CA 94720, United States

5. Department of Electrical and Computer Engineering, Rice University , Houston, TX 77005, United States

6. Carnegie Institution for Science, Department of Plant Biology , Stanford, CA 94305, United States

7. Department of Computational Biology, Institut Pasteur, Université Paris Cité , Paris 75015, France

Abstract

Abstract Motivation: The study of bacterial genome dynamics is vital for understanding the mechanisms underlying microbial adaptation, growth, and their impact on host phenotype. Structural variants (SVs), genomic alterations of 50 base pairs or more, play a pivotal role in driving evolutionary processes and maintaining genomic heterogeneity within bacterial populations. While SV detection in isolate genomes is relatively straightforward, metagenomes present broader challenges due to the absence of clear reference genomes and the presence of mixed strains. In response, our proposed method rhea, forgoes reference genomes and metagenome-assembled genomes (MAGs) by encompassing all metagenomic samples in a series (time or other metric) into a single co-assembly graph. The log fold change in graph coverage between successive samples is then calculated to call SVs that are thriving or declining. Results: We show rhea to outperform existing methods for SV and horizontal gene transfer (HGT) detection in two simulated mock metagenomes, particularly as the simulated reads diverge from reference genomes and an increase in strain diversity is incorporated. We additionally demonstrate use cases for rhea on series metagenomic data of environmental and fermented food microbiomes to detect specific sequence alterations between successive time and temperature samples, suggesting host advantage. Our approach leverages previous work in assembly graph structural and coverage patterns to provide versatility in studying SVs across diverse and poorly characterized microbial communities for more comprehensive insights into microbial gene flux. Availability and implementation: rhea is open source and available at: https://github.com/treangenlab/rhea.

Funder

Ken Kennedy Institute Recruiting

Rice University Wagoner Foreign Study Scholarship

NIH

National Institute of Allergy and Infectious Diseases

NSF

MIM Universal Rules of Live

European Union’s Horizon 2020

Marie Skłodowska-Curie

Carnegie Institution for Science

Department of Energy Joint Genome Institute

Office of Science

Department of Energy

Publisher

Oxford University Press (OUP)

Reference44 articles.

1. DIVE: a reference-free statistical approach to diversity-generating and mobile genetic element discovery;Abante;Genome Biol,2023

2. A survey of algorithms for the detection of genomic structural variants from long-read sequencing data;Ahsan;Nat Methods,2023

3. Basic local alignment search tool;Altschul;J Mol Biol,1990

4. High-quality metagenome assembly from long accurate reads with metaMDBG;Benoit;Nat Biotechnol,2024

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3