Unifying the known and unknown microbial coding sequence space

Author:

Vanni Chiara12ORCID,Schechter Matthew S13ORCID,Acinas Silvia G4,Barberán Albert5,Buttigieg Pier Luigi6,Casamayor Emilio O7ORCID,Delmont Tom O8ORCID,Duarte Carlos M9,Eren A Murat310ORCID,Finn Robert D11,Kottmann Renzo1,Mitchell Alex11,Sánchez Pablo4ORCID,Siren Kimmo12,Steinegger Martin1314,Gloeckner Frank Oliver21516,Fernàndez-Guerra Antonio117ORCID

Affiliation:

1. Microbial Genomics and Bioinformatics Research G, Max Planck Institute for Marine Microbiology

2. Jacobs University Bremen

3. Department of Medicine, University of Chicago

4. Department of Marine Biology and Oceanography, Institut de Ciències del Mar (CSIC)

5. Department of Environmental Science, University of Arizona

6. Alfred Wegener Institute, Helmholtz Centre for Polar and Marine Research, Alfred Wegener Institute

7. Center for Advanced Studies of Blanes CEAB-CSIC, Spanish Council for Research

8. Génomique Métabolique, Genoscope, Institut François Jacob, CEA, CNRS, Univ Evry, Université Paris-Saclay

9. Red Sea Research Centre and Computational Bioscience Research Center, King Abdullah University of Science and Technology

10. Josephine Bay Paul Center, Marine Biological Laboratory

11. European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus

12. Section for Evolutionary Genomics, The GLOBE Institute, University of Copenhagen

13. School of Biological Sciences, Seoul National University

14. Institute of Molecular Biology and Genetics, Seoul National University

15. University of Bremen and Life Sciences and Chemistry

16. Computing Center, Helmholtz Center for Polar and Marine Research

17. Lundbeck Foundation GeoGenetics Centre, GLOBE Institute, University of Copenhagen

Abstract

Genes of unknown function are among the biggest challenges in molecular biology, especially in microbial systems, where 40–60% of the predicted genes are unknown. Despite previous attempts, systematic approaches to include the unknown fraction into analytical workflows are still lacking. Here, we present a conceptual framework, its translation into the computational workflow AGNOSTOS and a demonstration on how we can bridge the known-unknown gap in genomes and metagenomes. By analyzing 415,971,742 genes predicted from 1749 metagenomes and 28,941 bacterial and archaeal genomes, we quantify the extent of the unknown fraction, its diversity, and its relevance across multiple organisms and environments. The unknown sequence space is exceptionally diverse, phylogenetically more conserved than the known fraction and predominantly taxonomically restricted at the species level. From the 71 M genes identified to be of unknown function, we compiled a collection of 283,874 lineage-specific genes of unknown function for Cand. Patescibacteria (also known as Candidate Phyla Radiation, CPR), which provides a significant resource to expand our understanding of their unusual biology. Finally, by identifying a target gene of unknown function for antibiotic resistance, we demonstrate how we can enable the generation of hypotheses that can be used to augment experimental data.

Funder

Max Planck Society

Horizon 2020

Biotechnology and Biological Sciences Research Council

European Molecular Biology Laboratory

Spanish Agency of Science MICIU/AEI/FEDER

Spanish Ministry of Economy and Competitiveness

Publisher

eLife Sciences Publications, Ltd

Subject

General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine,General Neuroscience

Reference135 articles.

1. A new genomic blueprint of the human gut microbiota;Almeida;Nature,2019

2. A unified catalog of 204,938 reference genomes from the human gut microbiome;Almeida;Nature Biotechnology,2021

3. Expanded diversity of microbial groups that shape the dissimilatory sulfur cycle;Anantharaman;The ISME Journal,2018

4. Design by Directed Evolution;Arnold;Accounts of Chemical Research,1998

5. Directed Evolution: Bringing New Chemistry to Life;Arnold;Angewandte Chemie (International Ed. in English),2018

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3