Author:
Gandhi Shreshth,Lee Leo J.,Delong Andrew,Duvenaud David,Frey Brendan J.
Abstract
AbstractMotivationDetermining RNA binding protein(RBP) binding specificity is crucial for understanding many cellular processes and genetic disorders. RBP binding is known to be affected by both the sequence and structure of RNAs. Deep learning can be used to learn generalizable representations of raw data and has improved state of the art in several fields such as image classification, speech recognition and even genomics. Previous work on RBP binding has either used shallow models that combine sequence and structure or deep models that use only the sequence. Here we combine both abilities by augmenting and refining the original Deepbind architecture to capture structural information and obtain significantly better performance.ResultsWe propose two deep architectures, one a lightweight convolutional network for transcriptome wide inference and another a Long Short-Term Memory(LSTM) network that is suitable for small batches of data. We incorporate computationally predicted secondary structure features as input to our models and show its effectiveness in boosting prediction performance. Our models achieved significantly higher correlations on held out in-vitro test data compared to previous approaches, and generalise well to in-vivo CLIP-SEQ data achieving higher median AUCs than other approaches. We analysed the output from our model for VTS1 and CPO and provided intuition into its working. Our models confirmed known secondary structure preferences for some proteins as well as found new ones where secondary structure might play a role. We also demonstrated the strengths of our model compared to other approaches such as the ability to combine information from long distances along the input.AvailabilitySoftware and models are available at https://github.com/shreshthgandhi/cDeepbindContactljlee@psi.toronto.edu, frey@psi.toronto.edu
Publisher
Cold Spring Harbor Laboratory
Reference40 articles.
1. Abadi, M. , Agarwal, A. , Barham, P. , Brevdo, E. , Chen, Z. , Citro, C. , Corrado, G. S. , Davis, A. , Dean, J. , Devin, M. , et al. (2016). Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.
2. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning
3. Sequence-specific recognition of rna hairpins by the sam domain of vts1p;Nature structural & molecular biology,2006
4. 10.1162/153244303322533223
5. Characterization of multimeric complexes formed by the human PTB1 protein on RNA
Cited by
13 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献