Variance Analysis of LC-MS Experimental Factors and Their Impact on Machine Learning-Reference-Cited by-同舟云学术

Variance Analysis of LC-MS Experimental Factors and Their Impact on Machine Learning

Published:2023-05-02 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Rehfeldt Tobias Greisager^ORCID,Krawczyk Konrad^ORCID,Echers Simon Gregersen^ORCID,Marcatili Paolo^ORCID,Palczynski Pawel,Röttger Richard^ORCID,Schwämmle Veit^ORCID

Abstract

AbstractBackgroundMachine learning (ML) technologies, especially deep learning (DL), have gained increasing attention in predictive mass spectrometry (MS) for enhancing the data processing pipeline from raw data analysis to end-user predictions and re-scoring. ML models need large-scale datasets for training and re-purposing, which can be obtained from a range of public data repositories. However, applying ML to public MS datasets on larger scales is challenging, as they vary widely in terms of data acquisition methods, biological systems, and experimental designs.ResultsWe aim to facilitate ML efforts in MS data by conducting a systematic analysis of the potential sources of variance in public MS repositories. We also examine how these factors affect ML performance and perform a comprehensive transfer learning to evaluate the benefits of current best practice methods in the field for transfer learning.ConclusionsOur findings show significantly higher levels of homogeneity within a project than between projects, which indicates that it’s important to construct datasets most closely resembling future test cases, as transferability is severely limited for unseen datasets. We also found that transfer learning, although it did increase model performance, did not increase model performance compared to a non-pre-trained model.

Publisher

Cold Spring Harbor Laboratory

Reference33 articles.

1. Aebersold R , Mann M. Mass-spectrometric exploration of proteome structure and function. Nature. 537:347–552016;

2. Altelaar AFM , Munoz J , Heck AJR . Next-generation proteomics: towards an integrative view of proteome dynamics. Nat Rev Genet. 14:35–482013;

3. Noor Z , Ahn SB , Baker MS , Ranganathan S , Mohamedali A. Mass spectrometry–based protein identification in proteomics—a review. Brief Bioinform. Oxford Academic; 22:1620–382020;

4. Bantscheff M , Schirle M , Sweetman G , Rick J , Kuster B. Quantitative mass spectrometry in proteomics: a critical review. Anal Bioanal Chem. 389:1017–312007;

5. Yadav A , Marini F , Cuomo A , Bonaldi T. Software Options for the Analysis of MS-Proteomic Data. Methods Mol Biol. 2361:35–592021;