Predicting the similarity of two mass spectrometry runs using only MS1 data-Reference-Cited by-同舟云学术

Predicting the similarity of two mass spectrometry runs using only MS1 data

Published:2023-11-29 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Shouaib Abdullah,Lin Andy^ORCID

Abstract

AbstractBackgroundTraditionally researchers can compare the similarity between a pair of mass spectrometry-based proteomics samples by comparing the lists of detected peptides that result from database searching or spectral library searching. Unfortunately, this strategy requires having substantial knowledge of the sample and parameterization of the peptide detection step. Therefore, new methods are needed that can rapidly compare proteomics samples against each other without extensive knowledge of the sample.ResultsWe present a set of neural network architectures that predict the proportion of confidently detected peptides in common between two proteomics runs using solely MS1 information as input. Specifically, when compared to several baseline models, we found that the convolutional and siamese neural networks obtained the best performance. In addition, we demonstrate that unsupervised clustering techniques can leverage the predicted output from our method to perform sample-level characterizations. Our methodology allows for the rapid comparison and characterization of proteomics samples sourced from various different acquisition methods, organisms, and instrument types.ConclusionsWe find that machine learning models, using only MS1 information, can be used to predict the similarity between liquid chromatography-tandem mass spectrometry proteomics runs.

Publisher

Cold Spring Harbor Laboratory

Reference25 articles.

1. The PRIDE database and related tools and resources in 2019: improving support for quantification data

2. Massive.quant: a community resource of quantitative mass spectrometry-based proteomics datasets;Nat Methods,2020

3. Repeatability and Reproducibility in Proteomic Identifications by Liquid Chromatography−Tandem Mass Spectrometry

4. Disms2: A flexible algorithm for direct proteome-wide distance calculation of lc-ms/ms runs;BMC Bioinformatics,2017

5. Molecular phylogenetics by direct comparison of tandem mass spectra;Rapid Commun Mass Spectrom,2012