Variability analysis of LC-MS experimental factors and their impact on machine learning-Reference-Cited by-同舟云学术

Variability analysis of LC-MS experimental factors and their impact on machine learning

Published:2022-12-28 Issue: Volume:12 Page:
ISSN:2047-217X
Container-title:GigaScience
language:en
Short-container-title:

Author:

Rehfeldt Tobias Greisager¹^ORCID,Krawczyk Konrad¹^ORCID,Echers Simon Gregersen²^ORCID,Marcatili Paolo³^ORCID,Palczynski Pawel⁴^ORCID,Röttger Richard¹^ORCID,Schwämmle Veit⁴^ORCID

Affiliation:

1. Department of Mathematics and Computer Science, University of Southern Denmark , 5230 Odense , Denmark

2. Department of Chemistry and Bioscience, Aalborg University , 9220 Aalborg , Denmark

3. Department of Health Technology, Technical University of Denmark , 2800 Kongens Lyngby , Denmark

4. Department of Biochemistry and Molecular Biology, University of Southern Denmark , 5230 Odense , Denmark

Abstract

Abstract Background Machine learning (ML) technologies, especially deep learning (DL), have gained increasing attention in predictive mass spectrometry (MS) for enhancing the data-processing pipeline from raw data analysis to end-user predictions and rescoring. ML models need large-scale datasets for training and repurposing, which can be obtained from a range of public data repositories. However, applying ML to public MS datasets on larger scales is challenging, as they vary widely in terms of data acquisition methods, biological systems, and experimental designs. Results We aim to facilitate ML efforts in MS data by conducting a systematic analysis of the potential sources of variability in public MS repositories. We also examine how these factors affect ML performance and perform a comprehensive transfer learning to evaluate the benefits of current best practice methods in the field for transfer learning. Conclusions Our findings show significantly higher levels of homogeneity within a project than between projects, which indicates that it is important to construct datasets most closely resembling future test cases, as transferability is severely limited for unseen datasets. We also found that transfer learning, although it did increase model performance, did not increase model performance compared to a non-pretrained model.

Funder

Velux Foundation

Publisher

Oxford University Press (OUP)

Subject

Computer Science Applications,Health Informatics

Link

https://academic.oup.com/gigascience/article-pdf/doi/10.1093/gigascience/giad096/53598799/giad096.pdf

Reference34 articles.

1. Mass-spectrometric exploration of proteome structure and function;Aebersold;Nature,2016

2. Next-generation proteomics: towards an integrative view of proteome dynamics;Altelaar;Nat Rev Genet,2013

3. The ProteomeXchange consortium in 2017: supporting the cultural change in proteomics public data deposition;Deutsch;Nucleic Acids Res,2017

4. ProteomicsML: an online platform for community-curated data sets and tutorials for machine learning in proteomics;Rehfeldt;J Proteome Res,2023

5. Peptide retention time prediction;Moruz;Mass Spectrom Rev,2017

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning;Nature Communications;2024-06-26

2. Decoding the Impact of Neighboring Amino Acid on ESI-MS Intensity Output through Deep Learning;2024-02-06

3. Imputation of label-free quantitative mass spectrometry-based proteomics data using self-supervised deep learning;2023-01-13