A learned score function improves the power of mass spectrometry database search-Reference-Cited by-同舟云学术

A learned score function improves the power of mass spectrometry database search

Published:2024-06-28 Issue:Supplement_1 Volume:40 Page:i410-i417
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Ananth Varun¹,Sanders Justin¹,Yilmaz Melih¹,Wen Bo²^ORCID,Oh Sewoong¹,Noble William Stafford¹²^ORCID

Affiliation:

1. Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, USA

2. Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA

Abstract

Abstract Motivation One of the core problems in the analysis of protein tandem mass spectrometry data is the peptide assignment problem: determining, for each observed spectrum, the peptide sequence that was responsible for generating the spectrum. Two primary classes of methods are used to solve this problem: database search and de novo peptide sequencing. State-of-the-art methods for de novo sequencing use machine learning methods, whereas most database search engines use hand-designed score functions to evaluate the quality of a match between an observed spectrum and a candidate peptide from the database. We hypothesized that machine learning models for de novo sequencing implicitly learn a score function that captures the relationship between peptides and spectra, and thus may be re-purposed as a score function for database search. Because this score function is trained from massive amounts of mass spectrometry data, it could potentially outperform existing, hand-designed database search tools. Results To test this hypothesis, we re-engineered Casanovo, which has been shown to provide state-of-the-art de novo sequencing capabilities, to assign scores to given peptide-spectrum pairs. We then evaluated the statistical power of this Casanovo score function, Casanovo-DB, to detect peptides on a benchmark of three mass spectrometry runs from three different species. In addition, we show that re-scoring with the Percolator post-processor benefits Casanovo-DB more than other score functions, further increasing the number of detected peptides.

Funder

National Science Foundation

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/bioinformatics/article-pdf/40/Supplement_1/i410/58354735/btae218.pdf

Reference35 articles.

1. Andromeda: a peptide search engine integrated into the MaxQuant environment;Cox;J Proteome Res,2011

2. Tandem: matching proteins with tandem mass spectra;Craig;Bioinformatics,2004

3. Faster SEQUEST searching for peptide identification from tandem mass spectra;Diament;J Proteome Res,2011

4. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry;Elias;Nat Methods,2007