Machine learning on large-scale proteomics data identifies tissue- and cell type-specific proteins-Reference-Cited by-同舟云学术

Machine learning on large-scale proteomics data identifies tissue- and cell type-specific proteins

Published:2022-10-05 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Claeys Tine^ORCID,Menu Maxime,Bouwmeester Robbin,Gevaert Kris^ORCID,Martens Lennart^ORCID

Abstract

AbstractUsing data from 183 public human data sets from PRIDE, a machine learning model was trained to identify tissue and cell-type specific protein patterns. PRIDE projects were searched with ionbot and tissue/cell type annotation was manually added. Data from physiological samples were used to train a Random Forest model on protein abundances to classify samples into tissues and cell types. Subsequently, a one-vs-all classification and feature importance were used to analyse the most discriminating protein abundances per class. Based on protein abundance alone, the model was able to predict tissues with 98% accuracy, and cell types with 99% accuracy. The F-scores describe a clear view on tissue-specific proteins and tissue-specific protein expression patterns. In-depth feature analysis shows slight confusion between physiologically similar tissues, demonstrating the capacity of the algorithm to detect biologically relevant patterns. These results can in turn inform downstream uses, from identification of the tissue of origin of proteins in complex samples such as liquid biopsies, to studying the proteome of tissue-like samples such as organoids and cell lines.

Publisher

Cold Spring Harbor Laboratory

Reference25 articles.

1. PRIDE: The proteomics identifications database

2. The PRIDE database resources in 2022: a hub for mass spectrometry-based proteomics evidences;Nucleic Acids Res,2022

3. Exploring the potential of public proteomics data

4. The Age of Data-Driven Proteomics: How Machine Learning Enables Novel Workflows;Proteomics,2020

5. The online Tabloid Proteome: An annotated database of protein associations;Nucleic Acids Res,2018

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Tissue‐based absolute quantification using large‐scale TMT and LFQ experiments;PROTEOMICS;2023-07-24