Unraveling the functional dark matter through global metagenomics
Author:
Pavlopoulos Georgios A., Baltoumas Fotis A.ORCID, Liu Sirui, Selvitopi Oguz, Camargo Antonio PedroORCID, Nayfach StephenORCID, Azad Ariful, Roux Simon, Call Lee, Ivanova Natalia N.ORCID, Chen I. Min, Paez-Espino DavidORCID, Karatzas Evangelos, Acinas Silvia G., Ahlgren Nathan, Attwood Graeme, Baldrian Petr, Berry Timothy, Bhatnagar Jennifer M., Bhaya Devaki, Bidle Kay D., Blanchard Jeffrey L., Boyd Eric S., Bowen Jennifer L., Bowman Jeff, Brawley Susan H., Brodie Eoin L., Brune Andreas, Bryant Donald A., Buchan Alison, Cadillo-Quiroz Hinsby, Campbell Barbara J., Cavicchioli Ricardo, Chuckran Peter F., Coleman Maureen, Crowe Sean, Colman Daniel R., Currie Cameron R., Dangl Jeff, Delherbe Nathalie, Denef Vincent J., Dijkstra Paul, Distel Daniel D., Eloe-Fadrosh Emiley, Fisher Kirsten, Francis Christopher, Garoutte Aaron, Gaudin Amelie, Gerwick Lena, Godoy-Vitorino Filipa, Guerra Peter, Guo Jiarong, Habteselassie Mussie Y., Hallam Steven J., Hatzenpichler Roland, Hentschel Ute, Hess Matthias, Hirsch Ann M., Hug Laura A., Hultman Jenni, Hunt Dana E., Huntemann Marcel, Inskeep William P., James Timothy Y., Jansson Janet, Johnston Eric R., Kalyuzhnaya Marina, Kelly Charlene N., Kelly Robert M., Klassen Jonathan L., Nüsslein Klaus, Kostka Joel E., Lindow Steven, Lilleskov Erik, Lynes Mackenzie, Mackelprang Rachel, Martin Francis M., Mason Olivia U., McKay R. Michael, McMahon Katherine, Mead David A., Medina Monica, Meredith Laura K., Mock Thomas, Mohn William W., Moran Mary Ann, Murray Alison, Neufeld Josh D., Neumann Rebecca, Norton Jeanette M., Partida-Martinez Laila P., Pietrasiak Nicole, Pelletier Dale, Reddy T. B. K., Reese Brandi Kiel, Reichart Nicholas J., Reiss Rebecca, Saito Mak A., Schachtman Daniel P., Seshadri Rekha, Shade Ashley, Sherman David, Simister Rachel, Simon Holly, Stegen James, Stepanauskas Ramunas, Sullivan Matthew, Sumner Dawn Y., Teeling Hanno, Thamatrakoln Kimberlee, Treseder Kathleen, Tringe Susannah, Vaishampayan Parag, Valentine David L., Waldo Nicholas B., Waldrop Mark P., Walsh David A., Ward David M., Wilkins Michael, Whitman Thea, Woolet Jamie, Woyke Tanja, Iliopoulos Ioannis, Konstantinidis KonstantinosORCID, Tiedje James M.ORCID, Pett-Ridge JenniferORCID, Baker DavidORCID, Visel AxelORCID, Ouzounis Christos A.ORCID, Ovchinnikov SergeyORCID, Buluç AydinORCID, Kyrpides Nikos C.ORCID,
Abstract
AbstractMetagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.
Publisher
Springer Science and Business Media LLC
Subject
Multidisciplinary
Reference67 articles.
1. New, F. N. & Brito, I. L. What is metagenomics teaching us, and what is missed? Annu. Rev. Microbiol. 74, 117–135 (2020). 2. Rinke, C. et al. Insights into the phylogeny and coding potential of microbial dark matter. Nature 499, 431–437 (2013). 3. Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021). 4. Meyer, F. et al. MG-RAST version 4—lessons learned from a decade of low-budget ultra-high-throughput metagenome analysis. Brief. Bioinform. 20, 1151–1159 (2019). 5. Ayling, M., Clark, M. D. & Leggett, R. M. New approaches for metagenome assembly with short reads. Brief. Bioinform. 21, 584–594 (2020).
Cited by
47 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|