Abstract
AbstractA core computational challenge in the analysis of mass spectrometry data is thede novosequencing problem, in which the generating amino acid sequence is inferred directly from an observed fragmentation spectrum without the use of a sequence database. Recently, deep learning models have made significant advances inde novosequencing by learning from massive datasets of high-confidence labeled mass spectra. However, these methods are primarily designed for data-dependent acquisition (DDA) experiments. Over the past decade, the field of mass spectrometry has been moving toward using data-independent acquisition (DIA) protocols for the analysis of complex proteomic samples due to their superior specificity and reproducibility. Hence, we present a newde novosequencing model called Cascadia, which uses a transformer architecture to handle the more complex data generated by DIA protocols. In comparisons with existing approaches forde novosequencing of DIA data, Cascadia achieves state-of-the-art performance across a range of instruments and experimental protocols. Additionally, we demonstrate Cascadia’s ability to accurately discoverde novocoding variants and peptides from the variable region of antibodies.
Publisher
Cold Spring Harbor Laboratory
Reference35 articles.
1. Ng, C. C. A. ; Zhou, Y. ; Yao, Z.-P. Algorithms for de-novo sequencing of peptides by tandem mass spectrometry: A review. Analytica Chimica Acta 2023, 341330.
2. Yilmaz, M. ; Fondrie, W. E. ; Bittremieux, W. ; Oh, S. ; Noble, W. S. In Proceedings of the International Conference on Machine Learning, 2022, pp 25514–25522.
3. DPST: de novo peptide sequencing with amino-acid-aware transformers;arXiv preprint,2022
4. Contra-Novo: A Contrastive Learning Approach to Enhance De Novo Peptide Sequencing;arXiv preprint,2023
5. Mao, Z. ; Zhang, R. ; Xin, L. ; Li, M. Mitigating the missing fragmentation problem in de novo peptide sequencing with a two stage graph-based deep learning model. Nature Machine Intelligence 2023, 5.