A Pre-computed Probabilistic Molecular Search Engine for Tandem Mass Spectrometry Proteomics-Reference-Cited by-同舟云学术

A Pre-computed Probabilistic Molecular Search Engine for Tandem Mass Spectrometry Proteomics

Published:2020-02-07 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Jones Jeff

Abstract

AbstractMass spectrometry methods of peptide identification involve comparing observed tandem spectra with in-silico derived spectrum models. Presented here is a proteomics search engine that offers a new variation of the standard approach, with improved results. The proposed method employs information theory and probabilistic information retrieval on a pre-computed and indexed fragmentation database generating a peptide-to-spectrum match (PSM) score modeled on fragment ion frequency. As a result, the direct application of modern document mining, allows for treating the collection of peptides as a corpus and corresponding fragment ions as indexable words, leveraging ready-built search engines and common predefined ranking algorithms. Fast and accurate PSM matches are achieved yielding a 5-10% higher rate of peptide identities than current database mining methods. Immediate applications of this search engine are aimed at identifying peptides from large sequence databases consisting of homologous proteins with minor sequence variations, such as genetic variation expected in the human population.

Publisher

Cold Spring Harbor Laboratory

Reference50 articles.

1. Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets;Nature Methods,2016

2. A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides

3. Probability-based protein identification by searching sequence databases using mass spectrometry data

4. Open Mass Spectrometry Search Algorithm

5. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database