Abstract
AbstractPeptide sequencing via tandem mass spectrometry (MS/MS) is fundamental in proteomics data analysis, playing a pivotal role in unraveling the complex world of proteins within biological systems. In contrast to conventional database searching methods, deep learning models excel in de novo sequencing peptides absent from existing databases, thereby facilitating the identification and analysis of novel peptide sequences. Current deep learning models for peptide sequencing predominantly use an autoregressive generation approach, where early errors can cascade, largely affecting overall sequence accuracy. And the usage of sequential decoding algorithms such as beam search suffers from the low inference speed. To address this, we introduceπ-PrimeNovo, a non-autoregressive Transformer-based deep learning model designed to perform accurate and efficient de novo peptide sequencing. With the proposed novel architecture,π-PrimeNovo achieves significantly higher accuracy and up to 69x faster sequencing compared to the state-of-the-art methods. This remarkable speed makes it highly suitable for computation-extensive peptide sequencing tasks such as metaproteomic research, whereπ-PrimeNovo efficiently identifies the microbial species-specific peptides. Moreover,π-PrimeNovo has been demonstrated to have a powerful capability in accurately mining phosphopeptides in a non-enriched phosphoproteomic dataset, showing an alternative solution to detect low-abundance post-translational modifications (PTMs). We suggest that this work not only advances the development of peptide sequencing techniques but also introduces a transformative computational model with wide-range implications for biological research.
Publisher
Cold Spring Harbor Laboratory