Conservative taxonomy and quality assessment of giant virus genomes with GVClass

Author:

Pitot Thomas MORCID,Brůna TomàšORCID,Schulz FrederikORCID

Abstract

AbstractBackgroundLarge double-stranded DNA viruses of the phylum Nucleocytoviricota (Giant viruses; GVs) include the largest known viruses, both in terms of capsid and genome size and are associated with a wide range of eukaryotic hosts. The ones able to infect protists and algae have been shown to be the dominant orders of GVs in the environmental samples. These viruses encode for genes that may have significantly impacted biogeochemical cycling and host genome evolution. While GVs are frequently found in environmental sequence data, their large and complex genomes, composed of genes acquired from various cellular lineages, pose challenges for their identification and taxonomic classification.ResultsWe present GVClass, a tool that identifies giant viruses in sequence data and provides taxonomic assignments, and estimates for genome completeness and contamination. GVClass performs gene calling optimized for giant viruses and utilizes a conservative approach based on consensus single protein phylogenies for robust taxonomic assignments. The genes used for classification represent highly conserved giant virus orthologous groups and low copy number cellular and viral panorthologs. In our benchmarking, GVClass demonstrated high quality and accurate taxonomic assignment of giant virus sequences. GVClass showed high to very high precision, with over 90% of tested instances correctly predicted at the genus level and near-perfect prediction (>99%) at higher taxonomic ranks (family, order, class).ConclusionIn the light of rapidly increasing amounts of sequence data and associated metagenome-assembled genomes, GVClass provides a conservative approach to identify, classify and quality-check giant virus genomes, which with other methods often remained unassigned or misclassified using other methods. GVClass has already been used through viral meta-analysis and to benchmark the viral sequences detection pipeline geNomad. The standalone version is freely available and it has been integrated in the Integrated Microbial Genomes / Virus database (IMG/VR), offering the opportunity to upload user data for giant virus classification.

Publisher

Cold Spring Harbor Laboratory

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3