Missing value imputation in proximity extension assay-based targeted proteomics data-Reference-Cited by-同舟云学术

Missing value imputation in proximity extension assay-based targeted proteomics data

Published:2020-12-14 Issue:12 Volume:15 Page:e0243487
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Lenz Michael^ORCID,Schulz Andreas,Koeck Thomas,Rapp Steffen,Nagler Markus,Sauer Madeleine,Eggebrecht Lisa,Ten Cate Vincent,Panova-Noeva Marina,Prochaska Jürgen H.,Lackner Karl J.,Münzel Thomas,Leineweber Kirsten,Wild Philipp S.,Andrade-Navarro Miguel A.

Abstract

Targeted proteomics utilizing antibody-based proximity extension assays provides sensitive and highly specific quantifications of plasma protein levels. Multivariate analysis of this data is hampered by frequent missing values (random or left censored), calling for imputation approaches. While appropriate missing-value imputation methods exist, benchmarks of their performance in targeted proteomics data are lacking. Here, we assessed the performance of two methods for imputation of values missing completely at random, the previously top-benchmarked ‘missForest’ and the recently published ‘GSimp’ method. Evaluation was accomplished by comparing imputed with remeasured relative concentrations of 91 inflammation related circulating proteins in 86 samples from a cohort of 645 patients with venous thromboembolism. The median Pearson correlation between imputed and remeasured protein expression values was 69.0% for missForest and 71.6% for GSimp (p = 5.8e-4). Imputation with missForest resulted in stronger reduction of variance compared to GSimp (median relative variance of 25.3% vs. 68.6%, p = 2.4e-16) and undesired larger bias in downstream analyses. Irrespective of the imputation method used, the 91 imputed proteins revealed large variations in imputation accuracy, driven by differences in signal to noise ratio and information overlap between proteins. In summary, GSimp outperformed missForest, while both methods show good overall imputation accuracy with large variations between proteins.

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference19 articles.

1. Implementation of proteomics in clinical trials;T He;Proteomics Clin Appl,2019

2. Novel endotypes in heart failure: effects on guideline-directed medical therapy;J Tromp;Eur Heart J,2018

3. Missing data and multiple imputation in clinical epidemiological research;AB Pedersen;Clin Epidemiol,2017

4. A primer on maximum likelihood algorithms available for use with missing data;CK Enders;Structural Equation Modeling,2001

5. Review: a gentle introduction to imputation of missing values;AR Donders;J Clin Epidemiol,2006

Cited by 9 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Assembly structures of coastal woody species of eastern South America: Patterns and drivers;Plant Diversity;2024-09

2. Rice phenology monitoring via ensemble classification for an extremely imbalanced multiclass dataset of hybrid remote sensing;Remote Sensing Applications: Society and Environment;2024-08

3. Multiple aspects of tree beta diversity in coastal ecosystems in Brazil;Journal of Biogeography;2024-04-02

4. iTa-DFiE: An Innovative Tensor Algebra-Based Detection Framework for Incomplete Noninvasive Electroencephalography;IEEE Access;2024

5. Dealing with missing values in proteomics data;PROTEOMICS;2022-11-17