Abstract
AbstractBackgroundRecent metagenomic studies have identified a huge number of viruses. Yet, the systematic assessment of the true genetic diversity of the whole virus community in our planet remains to be investigated. Here, we explored the genome and protein spaces of viruses by simulating the process of virus discovery in viral metagenomic studies.ResultsThis study estimated that there would be at least 3.52e+06 viral Operational Taxonomic Units (vOTUs) and 2.16e+07 viral protein clusters on earth, while only about 30% of the viral genetic diversity has been identified so far. Considering the balance of costs and the detected number of novel viruses, an additional 3.89e+05 samples are required to capture 84% of the total genetic diversity. When analyzing the viral genetic diversity by taxonomy and ecosystem, the estimated viral genetic diversity were consistent with those mentioned above.ConclusionsThis study for the first time explored the virus genome and protein space and estimated the total number of vOTUs and vPCs on earth when the virus genetic space was saturated. It provides a guide for future sequencing efforts of virus discovery and contributes to a better understanding of viral diversity in nature.
Publisher
Cold Spring Harbor Laboratory