Abstract
AbstractRecent developments in mass spectrometry (MS) instruments and data acquisition modes have aided multiplexed, fast, reproducible and quantitative analysis of proteome profiles, yet missing values remain a formidable challenge for proteomics data analysis. The stochastic nature of sampling in Data Dependent Acquisition (DDA), suboptimal preprocessing of Data Independent Acquisition (DIA) runs and dynamic range limitation of MS instruments impedes the reproducibility and accuracy of peptide quantification and can introduce systematic patterns of missingness that impact downstream analyses. Thus, imputation of missing values becomes an important element of data analysis. We introduce msImpute, an imputation method based on low-rank approximation, and compare it to six alternative imputation methods using public DDA and DIA datasets. We evaluate the performance of methods by determining the error of imputed values and accuracy of detection of differential expression. We also measure the post-imputation preservation of structures in the data at different levels of granularity. We develop a visual diagnostic to determine the nature of missingness in datasets based on peptides with high biological dropout rate and introduce a method to identify such peptides. Our findings demonstrate that msImpute performs well when data are missing at random and highlights the importance of prior knowledge about nature of missing values in a dataset when selecting an imputation technique.
Publisher
Cold Spring Harbor Laboratory
Reference38 articles.
1. Mass-spectrometric exploration of proteome structure and function
2. Normalization and missing value imputation for label-free LC-MS analysis
3. Direct infusion mass spectrometry metabolomics dataset: a benchmark for data processing and quality control;Scientific data,2014
4. Gillet, L.C. , Navarro, P. , Tate, S. , Röst, H. , Selevsek, N. , Reiter, L. , Bonner, R. , and Aebersold, R. (2012). Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Molecular & Cellular Proteomics 11.
5. Zhang, F. , Ge, W. , Ruan, G. , Cai, X. , and Guo, T. (2020). Data-Independent Acquisition Mass Spectrometry-Based Proteomics and Software Tools: A glimpse in 2020. Proteomics page 1900276.
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献