Missing values are informative in label-free shotgun proteomics data: estimating the detection probability curve-Reference-Cited by-同舟云学术

Missing values are informative in label-free shotgun proteomics data: estimating the detection probability curve

Published:2022-07-03 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Li Mengbo^ORCID,Smyth Gordon K.^ORCID

Abstract

AbstractMass spectrometry proteomics is a powerful tool in biomedical research but its usefulness is limited by the frequent occurrence of missing values in peptides that cannot be reliably quantified for particular samples. Many analysis strategies have been proposed for missing values where the discussion often focuses on distinguishing whether values are missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR). We argue here that missing values should always be viewed as MNAR in label-free proteomics because physical missing value mechanisms cannot be identified for individual points and because the probability of detection is related to underlying intensity. We show that the probability of detection can be accurately modeled by a logit linear curve. The curve asymptotes close to 100%, limiting the potential role of missing values unrelated to intensity. The curve is also incompatible with simple censoring mechanisms. We propose a statistical method for estimating the detection probability curve as a function of the underlying intensity, whether observed or not. The model quantifies the bias of missing intensities as compared to those that are observed. The model demonstrates that missing values are informative and suggests possible approaches to imputation and differential expression.

Publisher

Cold Spring Harbor Laboratory

Reference32 articles.

1. A Model for Random Sampling and Estimation of Relative Protein Abundance in Shotgun Proteomics

2. Protein Analysis by Shotgun/Bottom-up Proteomics

3. Accurate proteome-wide label-free quantification by delayed normalization and maximal peptide ratio extraction, termed MaxLFQ;Molecular & cellular proteomics,2014

4. A multicenter study benchmarks software tools for label-free proteome quantification

5. Next-generation proteomics: towards an integrative view of proteome dynamics