Transforming de novo peptide sequencing by explainable AI-Reference-Cited by-同舟云学术

Transforming de novo peptide sequencing by explainable AI

Published:2024-08-05 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Wang Yu¹^ORCID,Liang Zhendong¹,Ling Tianze²,Chang Cheng³^ORCID,Yang Tingpeng⁴,Xie Linhai²,He Yonghong⁵

Affiliation:

1. Pengcheng Laboratory

2. State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics

3. National Center for Protein Sciences (Beijing)

4. Pengcheng Laboratory & Tsinghua Shenzhen International Graduate School

5. Tsinghua University

Abstract

De novo peptide sequencing is crucial for identifying novel proteins, yet its broader application is constrained by the lack of a robust quality control system. In response, we developed a transformer-based model, π-xNovo, that accurately predicts peptides. By analyzing the model's attention matrix, we elucidated the contribution of spectral peaks to amino acid predictions, thus making de novo sequencing results explainable. Leveraging these insights, we designed a quality control system, π-xNovo-QC, which distinguishes peptide predictions with an accuracy exceeding 80% and a sensitivity above 90%. Applying this system to a large-scale deep human proteome dataset resulted in the identification of 1,931,761 additional peptides, marking a 137% increase over traditional database search results. These newly identified peptides with high confidence facilitated a 17.9% increase in protein identification, a 23.59% increase in the detection of single amino acid polymorphism events, and a 20.02% increase in exon-skipping splicing events. The deployment of this explainable AI system holds significant potential for expanding the application of de novo peptide sequencing, particularly in exploring the darker matter of the entire proteome universe.

Publisher

Springer Science and Business Media LLC

Reference36 articles.

1. Defining the mandate of proteomics in the post-genomics era: workshop report;Kenyon GL;Molecular & Cellular Proteomics,2002

2. Quantitative proteomics in biological research;Wilm M;Proteomics,2009

3. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search;Keller A;Analytical chemistry,2002

4. Andromeda: a peptide search engine integrated into the MaxQuant environment;Cox J;Journal of proteome research,2011

5. Automated interpretation of high‐energy collision‐induced dissociation spectra of singly protonated peptides by ‘seqms', a software aid for de novo sequencing by tandem mass spectrometry;Fernandez-de‐Cossio J;Rapid communications in mass spectrometry,1998