Multifaceted quality assessment of gene repertoire annotation with OMArk-Reference-Cited by-同舟云学术

Multifaceted quality assessment of gene repertoire annotation with OMArk

Published:2022-11-28 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Nevers Yannis^ORCID,Rossier Victor^ORCID,Train Clément Marie,Altenhoff Adrian^ORCID,Dessimoz Christophe^ORCID,Glover Natasha^ORCID

Abstract

AbstractAssessing the quality of protein-coding gene repertoires is critical in an era of increasingly abundant genome sequences for a diversity of species. State-of-the-art genome annotation assessment tools measure the completeness of a gene repertoire, but are blind to other types of errors, such as gene over-prediction or contamination.We developed OMArk, a software relying on fast, alignment-free sequence comparisons between a query proteome and precomputed gene families across the tree of life. OMArk assesses not only the completeness, but also the consistency of the gene repertoire as a whole relative to closely related species. It also reports likely contamination events.We validated OMArk with simulated data, then performed an analysis of the 1805 UniProt Eukaryotic Reference Proteomes, illustrating its usefulness for comparing and prioritizing proteomes based on their quality measures. In particular, we found strong evidence of contamination in 59 proteomes, and identified error propagation in avian gene annotation resulting from the use of a fragmented zebra finch proteome as reference.OMArk is available on GitHub (https://github.com/DessimozLab/OMArk), as a Python package on PyPi, and as an interactive online tool athttps://omark.omabrowser.org/.

Publisher

Cold Spring Harbor Laboratory

Reference20 articles.

1. Earth BioGenome Project: Sequencing life for the future of life

2. Towards complete and error-free genome assemblies of all vertebrate species

3. Darwin Tree of Life Project Consortium. Sequence locally, think globally: The Darwin Tree of Life Project. Proc. Natl. Acad. Sci. U. S. A. 119, (2022).

4. The era of reference genomes in conservation genomics;Trends Ecol. Evol,2022

5. Lawniczak, M. K. N. et al. Standards recommendations for the Earth BioGenome Project. Proc. Natl. Acad. Sci. U. S. A. 119, (2022).

Cited by 13 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. De novogenome sequence assembly of the RNAi-tractable endosymbiosis model systemParamecium bursaria186b reveals factors shaping intron repertoire;2024-08-09

2. A haplotype-resolved reference genome of a long-distance migratory bat, Pipistrellus nathusii (Keyserling & Blasius, 1839);DNA Research;2024-06-07

3. Expanding the Triangle of U: The genome assembly ofHirschfeldia incanaprovides insights into chromosomal evolution, phylogenomics and high photosynthesis-related traits;2024-05-18

4. Orthology inference at scale with FastOMA;2024-01-31

5. Gene modelling and annotation for the Hawaiian bobtail squid, Euprymna scolopes;Scientific Data;2024-01-06