A learned score function improves the power of mass spectrometry database search

Author:

Ananth Varun,Sanders Justin,Yilmaz Melih,Oh Sewoong,Noble William StaffordORCID

Abstract

AbstractOne of the core problems in the analysis of protein tandem mass spectrometry data is the peptide assignment problem: determining, for each observed spectrum, the peptide sequence that was responsible for generating the spectrum. Two primary classes of methods are used to solve this problem: database search andde novopeptide sequencing. State-of-the-art methods forde novosequencing employ machine learning methods, whereas most database search engines use hand-designed score functions to evaluate the quality of a match between an observed spectrum and a candidate peptide from the database. We hypothesize that machine learning models forde novosequencing implicitly learn a score function that captures the relationship between peptides and spectra, and thus may be re-purposed as a score function for database search. Because this score function is trained from massive amounts of mass spectrometry data, it could potentially outperform existing, hand-designed database search tools. To test this hypothesis, we re-engineered Casanovo, which has been shown to provide state-of-the-artde novosequencing capabilities, to assign scores to given peptide-spectrum pairs. We then evaluated the statistical power of this Casanovo score function, Casanovo-DB, to detect peptides on a benchmark of three mass spectrometry runs from three different species. Our results show that, at a 1% peptide-level false discovery rate threshold, Casanovo-DB outperforms existing hand-designed score functions by 35% to 88%. In addition, we show that re-scoring with the Percolator post-processor benefits Casanovo-DB more than other score functions, further increasing the number of detected peptides.

Publisher

Cold Spring Harbor Laboratory

Reference25 articles.

1. “An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database;Journal of the American Society for Mass Spectrometry,1994

2. Comet: An open-source MS/MS sequence database search tool

3. Faster SEQUEST Searching for Peptide Identification from Tandem Mass Spectra

4. “A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics;Journal of Proteomics,2010

5. “On the importance of well calibrated scores for identifying shotgun proteomics spectra;Journal of Proteome Research,2015

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3