SMAP: A pipeline for sample matching in proteogenomics-Reference-Cited by-同舟云学术

SMAP: A pipeline for sample matching in proteogenomics

Published:2021-09-20 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Li Ling,Niu Mingming,Erickson Alyssa,Luo Jie,Rowbotham Kincaid,Huang He,Li Yuxin,Jiang Yi,Liu Chunyu^ORCID,Peng Junmin,Wang Xusheng

Abstract

AbstractIntegration of genomics and proteomics (proteogenomics) offers unprecedented promise for in-depth understanding of human diseases. However, sample mix-up is a pervasive, recurring problem, due to complex sample processing in proteogenomics. Here we present a pipeline for Sample Matching in Proteogenomics (SMAP) for verifying sample identity to ensure data integrity. SMAP infers sample-dependent protein-coding variants from quantitative mass spectrometry (MS), and aligns the MS-based proteomic samples with genomic samples by two discriminant scores. Theoretical analysis with simulation data indicates that SMAP is capable of uniquely match proteomic and genomic samples, when ≥20% genotypes of individual samples are available. When SMAP was applied to a large-scale proteomics dataset from 288 biological samples generated by the PsychENCODE BrainGVEX project, we identified and corrected 18.8% (54/288) mismatched samples. The correction was further confirmed by ribosome profiling and assay for transposase-accessible chromatin sequencing data from the same set of samples. Thus our results demonstrate that SMAP is an effective tool for sample verification in a large-scale MS-based proteogenomics study. The source code, manual, and sample data of the SMAP are publicly available at https://github.com/UND-Wanglab/SMAP, and a web-based SMAP can be accessed at https://smap.shinyapps.io/smap/.

Publisher

Cold Spring Harbor Laboratory

Reference26 articles.

1. The Cancer Genome Atlas Pan-Cancer analysis project

2. Integrated Proteogenomic Characterization of Human High-Grade Serous Ovarian Cancer

3. Proteogenomic characterization of human colon and rectal cancer

4. Proteogenomic Analysis of Human Colon Cancer Reveals New Therapeutic Opportunities

5. Proteogenomics connects somatic mutations to signalling in breast cancer