Affiliation:
1. School of Mathematical Sciences
2. Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and the State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences
3. Center for Quantitative Biology, Peking University, Beijing, China
Abstract
AbstractMotivationConvolutional neural networks (CNNs) have outperformed conventional methods in modeling the sequence specificity of DNA–protein binding. While previous studies have built a connection between CNNs and probabilistic models, simple models of CNNs cannot achieve sufficient accuracy on this problem. Recently, some methods of neural networks have increased performance using complex neural networks whose results cannot be directly interpreted. However, it is difficult to combine probabilistic models and CNNs effectively to improve DNA–protein binding predictions.ResultsIn this article, we present a novel global pooling method: expectation pooling for predicting DNA–protein binding. Our pooling method stems naturally from the expectation maximization algorithm, and its benefits can be interpreted both statistically and via deep learning theory. Through experiments, we demonstrate that our pooling method improves the prediction performance DNA–protein binding. Our interpretable pooling method combines probabilistic ideas with global pooling by taking the expectations of inputs without increasing the number of parameters. We also analyze the hyperparameters in our method and propose optional structures to help fit different datasets. We explore how to effectively utilize these novel pooling methods and show that combining statistical methods with deep learning is highly beneficial, which is promising and meaningful for future studies in this field.Availability and implementationAll code is public in https://github.com/gao-lab/ePooling.Supplementary informationSupplementary data are available at Bioinformatics online.
Funder
National Key Research and Development Program of China
National Key Basic Research Project of China
National Natural Science Foundation of China
National Key R&D Program of China
China 863 Program
Beijing Advanced Innovation Center for Genomics
State Key Laboratory of Protein and Plant Gene Research, Peking University
Publisher
Oxford University Press (OUP)
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability
Cited by
27 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献