Abstract
AbstractCo-Fractionation Mass Spectrometry (CFMS) enables the discovery of protein complexes and the systems-level analyses of multimer dynamics that facilitate responses to environmental and developmental conditions. A major challenge in the CFMS analyses, and other omics approaches in general, is to conduct validation experiments at scale and develop precise methods to evaluate the performance of the analyses. For protein complex composition predictions, CORUM is commonly used as a source of known complexes; however, the subunit pools in cell extracts are very rarely in the assumed fully assembled states. Therefore, a fundamental conflict exists between the assumed multimerization of the CORUM “gold standards” and the CFMS experimental datasets to be evaluated. In this paper, we develop a machine learning-based “small world” data analysis method. This method uses size exclusion chromatography profiles of predicted CORUM complex subunits to identify relatively rare instances of fully assembled complexes, as well as bona fide stable CORUM subcomplexes. Our method involves a two-stage machine learning approach that integrates information from CORUM and CFMS experiments to generate reliable gold standards of protein complexes. The predictions are evaluated by both statistical significance and size comparison between calculated and predicted complexes. These validated gold standards are then used to assess the overall reliability of CFMS-based protein complex composition predictions.
Publisher
Cold Spring Harbor Laboratory