Abstract
AbstractMotivationInferring taxonomy in mass spectrometry-based shotgun proteomics is a complex task. In multi-species or viral samples of unknown taxonomic origin, the presence of proteins and corresponding taxa must be inferred from a list of identified peptides which is often complicated by protein homology: many proteins do not only share peptides within a taxon but also between taxa. However, correct taxonomic identification is crucial when identifying different viral strains with high sequence homology – considering, e.g., the different epidemiological characteristics of the various strains of SARS-CoV-2. Additionally, many viruses mutate frequently, further complicating the correct assignment of virus proteomic samples.ResultsWe present PepGM, a probabilistic graphical for the taxonomic assignment of virus proteomic samples with strain-level resolution and associated confidence scores. PepGM combines the results of a standard proteomic database search algorithm with belief propagation to calculate the marginal distributions, and thus confidence score, for potential taxonomic assignments. We demonstrate the performance of PepGM using several publicly available virus proteomic datasets, showing its strain-level resolution performance. In two out of eight cases, the taxonomic assignments were only correct on species level, which PepGM clearly indicates by lower confidence scores.Availability and ImplementationPepGM is written in Python and embedded into a Snakemake workflow. Its is available at https://github.com/BAMeScience/PepGM
Publisher
Cold Spring Harbor Laboratory
Reference45 articles.
1. The public health threat of emerging viral disease;The Journal of Nutrition,1997
2. Viral surveillance and discovery;Current Opinion in Virology,2013
3. Hirabara, S. M. ; Serdan, T. D. A. ; Gorjao, R. ; Masi, L. N. ; Pithon-Curi, T. C. ; Co-vas, D. T. ; Curi, R. ; Durigon, E. L. SARS-COV-2 Variants: Differences and Potential of Immune Evasion. Frontiers in Cellular and Infection Microbiology 2022, 11 .
4. Resurgence of SARS-CoV-2: Detection by community viral surveillance