Abstract
SUMMARYBacteriophages drive evolutionary change in bacterial communities by creating gene flow networks that fuel ecological adaptions. However, the extent of viral diversity and prevalence in the human gut remains largely unknown. Here, we introduce the Gut Phage Database (GPD), a collection of ∼142,000 non-redundant viral genomes (>10 kb) obtained by mining a dataset of 28,060 globally distributed human gut metagenomes and 2,898 reference genomes of cultured gut bacteria. Host assignment revealed that viral diversity is highest in the Firmicutes phyla and that ∼36% of viral clusters (VCs) are not restricted to a single species, creating gene flow networks across phylogenetically distinct bacterial species. Epidemiological analysis uncovered 280 globally distributed VCs found in at least 5 continents and a highly prevalent novel phage clade with features reminiscent of p-crAssphage. This high-quality, large-scale catalogue of phage genomes will improve future virome studies and enable ecological and evolutionary analysis of human gut bacteriophages.
Publisher
Cold Spring Harbor Laboratory
Reference57 articles.
1. Abadi, M. , Barham, P. , Chen, J. , Chen, Z. , Davis, A. , Dean, J. , Devin, M. , Ghemawat, S. , Irving, G. , Isard, M. , et al. TensorFlow: A system for large-scale machine learning. 21.
2. Tailed bacteriophages: the order caudovirales. Adv;Virus Res,1998
3. Clades of huge phages from across Earth’s ecosystems;Nature,2020
4. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing