Abstract
ABSTRACTTranscript abundance is a widely used but poor predictor of protein abundance. As proteins are the actual agents executing biological functions, and because signaling outcome depends in a non-linear manner on the concentration of the network components, we aimed to develop a convolutional neural network-(CNN-) based predictor forHomo sapiensand the reference plantArabidopsis thaliana. After hyperparameter optimization and initial analysis of the training data, we employed a distinct training module for value and sequence data, respectively, predicting 40% of the variance in protein levels inHomo sapiens, respectively 48% inArabidopsis thaliana. Codon counts and peptides had the greatest predictive power. Extracting the learned weight revealed generally similar trends but also some intriguing differences between human and Arabidopsis. Many learned motifs in the 5’ and 3’ UTRs correspond to previously described regulatory features demonstrating that the model can learn ab initio mechanistically relevant features.
Publisher
Cold Spring Harbor Laboratory