SpecEncoder: deep metric learning for accurate peptide identification in proteomics-Reference-Cited by-同舟云学术

SpecEncoder: deep metric learning for accurate peptide identification in proteomics

Published:2024-06-28 Issue:Supplement_1 Volume:40 Page:i257-i265
ISSN:1367-4803
Container-title:Bioinformatics
language:en
Short-container-title:

Author:

Liu Kaiyuan¹^ORCID,Tao Chenghua¹,Ye Yuzhen¹^ORCID,Tang Haixu¹

Affiliation:

1. Department of Computer Science, Luddy School of Informatics, Computing and Engineering, Indiana University , IN 47408, United States

Abstract

Abstract Motivation Tandem mass spectrometry (MS/MS) is a crucial technology for large-scale proteomic analysis. The protein database search or the spectral library search are commonly used for peptide identification from MS/MS spectra, which, however, may face challenges due to experimental variations between replicated spectra and similar fragmentation patterns among distinct peptides. To address this challenge, we present SpecEncoder, a deep metric learning approach to address these challenges by transforming MS/MS spectra into robust and sensitive embedding vectors in a latent space. The SpecEncoder model can also embed predicted MS/MS spectra of peptides, enabling a hybrid search approach that combines spectral library and protein database searches for peptide identification. Results We evaluated SpecEncoder on three large human proteomics datasets, and the results showed a consistent improvement in peptide identification. For spectral library search, SpecEncoder identifies 1%–2% more unique peptides (and PSMs) than SpectraST. For protein database search, it identifies 6%–15% more unique peptides than MSGF+ enhanced by Percolator, Furthermore, SpecEncoder identified 6%–12% additional unique peptides when utilizing a combined library of experimental and predicted spectra. SpecEncoder can also identify more peptides when compared to deep-learning enhanced methods (MSFragger boosted by MSBooster). These results demonstrate SpecEncoder’s potential to enhance peptide identification for proteomic data analyses. Availability and Implementation The source code and scripts for SpecEncoder and peptide identification are available on GitHub at https://github.com/lkytal/SpecEncoder. Contact: hatang@iu.edu.

Funder

National Science Foundation

National Institutes of Health

University Precision Health Initiative

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/bioinformatics/article-pdf/40/Supplement_1/i257/58354825/btae220.pdf

Reference36 articles.

1. Mass spectrometry-based proteomics;Aebersold;Nature,2003

2. An optimized shotgun strategy for the rapid generation of comprehensive human proteomes;Bekker-Jensen;Cell Syst,2017

3. A learned embedding for efficient joint analysis of millions of mass spectra;Bittremieux;Nat Methods,2022

4. Uniprot: a hub for protein information;Consortium U;Nucleic Acids Research,2015