Abstract
AbstractMost microbes on our planet remain uncultured and poorly studied. Recent efforts to catalog their genetic diversity have revealed that a significant fraction of the observed microbial genes are functional and evolutionary untraceable, lacking homologs in reference databases. Despite their potential biological value, these apparently unrelated orphan genes from uncultivated taxa have been routinely discarded in metagenomics surveys. Here, we analyzed a global multi-habitat dataset covering 151,697 medium and high-quality metagenome assembled genomes (MAGs), 5,969 single-amplified genomes (SAGs), and 19,642 reference genomes, and identified 413,335 highly curated novel protein families under strong purifying selection out of previously considered orphan genes. These new protein families, representing a three-fold increase over the total number of prokaryotic orthologous groups described to date, spread out across the prokaryote phylogeny, can span multiple habitats, and are notably overrepresented in recently discovered taxa. By genomic context analysis, we pinpointed thousands of unknown protein families to phylogenetically conserved operons linked to energy production, xenobiotic metabolism and microbial resistance. Most remarkably, we found 980 previously neglected protein families that can accurately distinguish entire uncultivated phyla, classes, and orders, likely representing synapomorphic traits that fostered their divergence. The systematic curation and evolutionary analysis of the unique genetic repertoire of uncultivated taxa opens new avenues for understanding the biology and ecological roles of poorly explored lineages at a global scale.
Publisher
Cold Spring Harbor Laboratory
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献