Machine learning strategies to tackle data challenges in mass spectrometry-based proteomics-Reference-Cited by-同舟云学术

Machine learning strategies to tackle data challenges in mass spectrometry-based proteomics

Published:2024-05-05 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Dens Ceder^ORCID,Adams Charlotte^ORCID,Laukens Kris^ORCID,Bittremieux Wout^ORCID

Abstract

AbstractIn computational proteomics, machine learning (ML) has emerged as a vital tool for enhancing data analysis. Despite significant advancements, the diversity of ML model architectures and the complexity of proteomics data present substantial challenges in the effective development and evaluation of these tools. Here, we highlight the necessity for high-quality, comprehensive datasets to train ML models and advocate for the standardization of data to support robust model development. We emphasize the instrumental role of key datasets like ProteomeTools and MassIVE-KB in advancing ML applications in proteomics and discuss the implications of dataset size on model performance, highlighting that larger datasets typically yield more accurate models. To address data scarcity, we explore algorithmic strategies such as self-supervised pretraining and multi-task learning. Ultimately, we hope that this discussion can serve as a call to action for the proteomics community to collaborate on data standardization and collection efforts, which are crucial for the sustainable advancement and refinement of ML methodologies in the field.

Publisher

Cold Spring Harbor Laboratory

Reference76 articles.

1. Machine learning applications in proteomics research: How the past can boost the future

2. Semi-supervised learning for peptide identification from shotgun proteomics datasets

3. MS2PIP: a tool for MS/MS peak intensity prediction

4. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning

5. High-quality MS/MS spectrum prediction for data-dependent and data-independent acquisition data analysis

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Koina: Democratizing machine learning for proteomics research;2024-06-03