APIR: Aggregating Universal Proteomics Database Search Algorithms for Peptide Identification with FDR Control-Reference-Cited by-同舟云学术

APIR: Aggregating Universal Proteomics Database Search Algorithms for Peptide Identification with FDR Control

Published:2024-04 Issue:2 Volume:22 Page:
ISSN:1672-0229
Container-title:Genomics, Proteomics & Bioinformatics
language:en
Short-container-title:

Author:

Chen Yiling Elaine¹^ORCID,Ge Xinzhou¹^ORCID,Woyshner Kyla²^ORCID,McDermott MeiLu²³^ORCID,Manousopoulou Antigoni²^ORCID,Ficarro Scott B⁴^ORCID,Marto Jarrod A⁴^ORCID,Li Kexin¹^ORCID,Wang Leo David²⁵^ORCID,Li Jingyi Jessica¹⁶⁷⁸⁹^ORCID

Affiliation:

1. Department of Statistics and Data Science, University of California , Los Angeles, CA 90095, USA

2. Department of Immuno-Oncology, Beckman Research Institute, City of Hope National Medical Center , Duarte, CA 91010, USA

3. Department of Quantitative and Computational Biology, University of Southern California , Los Angeles, CA 90089, USA

4. Department of Cancer Biology and Blais Proteomics Center, Dana-Farber Cancer Institute, Department of Pathology, Brigham and Women’s Hospital and Harvard Medical School , Boston, MA 02215, USA

5. Department of Pediatrics, City of Hope National Medical Center , Duarte, CA 91010, USA

6. Bioinformatics Interdepartmental Program, University of California , Los Angeles, CA 90095, USA

7. Department of Human Genetics, University of California , Los Angeles, CA 90095, USA

8. Department of Computational Medicine, University of California , Los Angeles, CA 90095, USA

9. Department of Biostatistics, University of California , Los Angeles, CA 90095, USA

Abstract

Abstract Advances in mass spectrometry (MS) have enabled high-throughput analysis of proteomes in biological systems. The state-of-the-art MS data analysis relies on database search algorithms to quantify proteins by identifying peptide–spectrum matches (PSMs), which convert mass spectra to peptide sequences. Different database search algorithms use distinct search strategies and thus may identify unique PSMs. However, no existing approaches can aggregate all user-specified database search algorithms with a guaranteed increase in the number of identified peptides and a control on the false discovery rate (FDR). To fill in this gap, we proposed a statistical framework, Aggregation of Peptide Identification Results (APIR), that is universally compatible with all database search algorithms. Notably, under an FDR threshold, APIR is guaranteed to identify at least as many, if not more, peptides as individual database search algorithms do. Evaluation of APIR on a complex proteomics standard dataset showed that APIR outpowers individual database search algorithms and empirically controls the FDR. Real data studies showed that APIR can identify disease-related proteins and post-translational modifications missed by some individual database search algorithms. The APIR framework is easily extendable to aggregating discoveries made by multiple algorithms in other high-throughput biomedical data analysis, e.g., differential gene expression analysis on RNA sequencing data. The APIR R package is available at https://github.com/yiling0210/APIR.

Funder

National Cancer Institute, USA

National Cancer Institute under Cancer Center

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/gpb/advance-article-pdf/doi/10.1093/gpbjnl/qzae042/58064554/qzae042.pdf

Reference92 articles.

1. Proteomics reveals NNMT as a master metabolic regulator of cancer-associated fibroblasts;Eckert;Nature,2019

2. Clinical proteomics of breast cancer reveals a novel layer of breast cancer classification;Yanovich;Cancer Res,2018

3. Multidimensional separations-based shotgun proteomics;Fournier;Chem Rev,2007

4. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database;Eng;J Am Soc Mass Spectrom,1994