Expectation pooling: an effective and interpretable pooling method for predicting DNA–protein binding

Author:

Luo Xiao1,Tu Xinming2,Ding Yang2,Gao Ge2ORCID,Deng Minghua13ORCID

Affiliation:

1. School of Mathematical Sciences

2. Biomedical Pioneering Innovation Center (BIOPIC), Beijing Advanced Innovation Center for Genomics (ICG), Center for Bioinformatics (CBI), and the State Key Laboratory of Protein and Plant Gene Research at School of Life Sciences

3. Center for Quantitative Biology, Peking University, Beijing, China

Abstract

AbstractMotivationConvolutional neural networks (CNNs) have outperformed conventional methods in modeling the sequence specificity of DNA–protein binding. While previous studies have built a connection between CNNs and probabilistic models, simple models of CNNs cannot achieve sufficient accuracy on this problem. Recently, some methods of neural networks have increased performance using complex neural networks whose results cannot be directly interpreted. However, it is difficult to combine probabilistic models and CNNs effectively to improve DNA–protein binding predictions.ResultsIn this article, we present a novel global pooling method: expectation pooling for predicting DNA–protein binding. Our pooling method stems naturally from the expectation maximization algorithm, and its benefits can be interpreted both statistically and via deep learning theory. Through experiments, we demonstrate that our pooling method improves the prediction performance DNA–protein binding. Our interpretable pooling method combines probabilistic ideas with global pooling by taking the expectations of inputs without increasing the number of parameters. We also analyze the hyperparameters in our method and propose optional structures to help fit different datasets. We explore how to effectively utilize these novel pooling methods and show that combining statistical methods with deep learning is highly beneficial, which is promising and meaningful for future studies in this field.Availability and implementationAll code is public in https://github.com/gao-lab/ePooling.Supplementary informationSupplementary data are available at Bioinformatics online.

Funder

National Key Research and Development Program of China

National Key Basic Research Project of China

National Natural Science Foundation of China

National Key R&D Program of China

China 863 Program

Beijing Advanced Innovation Center for Genomics

State Key Laboratory of Protein and Plant Gene Research, Peking University

Publisher

Oxford University Press (OUP)

Subject

Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Statistics and Probability

Cited by 27 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3