Abstract
AbstractLabelling strategies in mass spectrometry (MS)-based proteomics enable increased sample throughput by acquiring multiplexed samples in a single run. However, contemporary designs often require the acquisition of multiple runs, leading to a complex correlation structure. Addressing this correlation is key for correct statistical inference and reliable biomarker discovery. Therefore, we present msqrob2TMT, a set of mixed model-based workflows tailored toward differential abundance analysis for labelled MS-based proteomics data. Thanks to its increased flexibility, msqrob2TMT can model both sample-specific and feature-specific (e.g. peptide or protein) covariates, which unlocks the inference to experiments with arbitrarily complex designs as well as to correct explicitly for feature-specific properties. We benchmark our novel workflows against the state-of-the-art tools MSstatsTMT and DeqMS in a spike-in study. We show that our workflows are modular, more flexible and have improved performance by adopting robust ridge regression. We also found that reference channel normalization and imputation can have a deleterious impact on the statistical outcome. Finally, we demonstrate the significance of msqrob2TMT on a real-life mice study, showcasing the importance of effectively accounting for the hierarchical correlation structure in the data.
Publisher
Cold Spring Harbor Laboratory