Abstract
ABSTRACTUnderstanding the complex interactions between transcriptome and proteome is essential in uncovering cellular mechanisms both in health and disease contexts. The limited correlations between corresponding transcript and protein abundance suggest that regulatory processes tightly govern information flow surrounding transcription and translation, and beyond. In this study we adopt an approach which expands the feature scope that models the human proteome: we develop machine learning models that incorporate sequence-derived features (SDFs), sometimes in conjunction with corresponding mRNA levels. We develop a large resource of sequence-derived features which cover a significant proportion of the H. sapiens proteome, demonstrate which of these features are significant in prediction on multiple cell lines, and suggest insights into which biological processes can be explained using these features. We reveal that (a) SDFs are significantly better at protein abundance prediction across multiple cell lines both in steady-state and dynamic contexts, (b) that SDFs can cover the domain of translation with relative efficiency but struggle with cell-line specific pathways and (c) provide a resource which can be plugged into many subsequent protein-centric analyses.
Publisher
Cold Spring Harbor Laboratory
Reference63 articles.
1. Post-transcriptional expression regulation in the yeast Saccharomyces cerevisiae on a genomic scale. Mol;Cell. Proteomics,2004
2. Vogel, C. , de Sousa Abreu, R. , Ko, D. , Le, S. , Shapiro, B. , Burns, S. , Sandhu, D. , Boutz, D. , Marcotte, E. and Penalva, L. (2010). Sequence signatures and mRNA concentration can explain two-thirds of protein abundance variation in a human cell line. Molecular Systems Biology, 6.
3. The utility of protein and mRNA correlation
4. Deep proteome and transcriptome mapping of a human cancer cell line;Molecular Systems Biology,2014
5. Integrated Analysis of Transcriptomic and Proteomic Data