Affiliation:
1. National Institute of Standards and Technology, NIST Charleston, 331 Fort Johnson Road, Charleston, SC 29412, USA
Abstract
The last decade has witnessed dramatic improvements in whole-genome sequencing capabilities coupled to drastically decreased costs, leading to an inundation of high-quality de novo genomes. For this reason, the continued development of genome quality metrics is imperative. Using the 2016 Atlantic bottlenose dolphin NCBI RefSeq annotation and mass spectrometry-based proteomic analysis of six tissues, we confirmed 10,402 proteins from 4711 protein groups, constituting nearly one-third of the possible predicted proteins. Since the identification of larger proteins with more identified peptides implies reduced database fragmentation and improved gene annotation accuracy, we propose the metric NP10, which attempts to capture this quality improvement. The NP10 metric is calculated by first stratifying proteomic results by identifying the top decile (or 10th 10-quantile) of identified proteins based on the number of peptides per protein and then returns the median molecular weight of the resulting proteins. When using the 2016 versus 2012 Tursiops truncatus genome annotation to search this proteomic data set, there was a 21% improvement in NP10. This metric was further demonstrated by using a publicly available proteomic data set to compare human genome annotations from 2004, 2013 and 2016, which showed a 33% improvement in NP10. These results demonstrate that proteomics may be a useful metrological tool to benchmark genome accuracy, though there is a need for reference proteomic datasets across species to facilitate the evaluation of new de novo and existing genome.
Funder
National Institute of Standards and Technology
Subject
Genetics (clinical),Genetics
Reference44 articles.
1. (2023, August 18). DNA Sequencing Costs: Data from the NHGRI Genome Sequencing Program (GSP), Available online: https://www.genome.gov/sequencingcostsdata.
2. Proteomics in Non-model Organisms: A New Analytical Frontier;Heck;J. Proteome Res.,2020
3. Single-molecule sequencing and chromatin conformation capture enable de novo reference assembly of the domestic goat genome;Bickhart;Nat. Genet.,2017
4. A golden goat genome;Worley;Nat. Genet.,2017
5. Mohr, D.W., Naguib, A., Weisenfeld, N., Kumar, V., Shah, P., Church, D.M., Jaffe, D., and Scott, A.F. (2017). Improved de novo Genome Assembly: Linked-Read Sequencing Combined with Optical Mapping Produce a High Quality Mammalian Genome at Relatively Low Cost. bioRxiv, 128348.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献