Affiliation:
1. School of Computer Science and Engineering, Nanjing University of Science and Technology, China
2. Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia
Abstract
Abstract
Knowledge of the specificity of DNA-protein binding is crucial for understanding the mechanisms of gene expression, regulation and gene therapy. In recent years, deep-learning-based methods for predicting DNA-protein binding from sequence data have achieved significant success. Nevertheless, the current state-of-the-art computational methods have some drawbacks associated with the use of limited datasets with insufficient experimental data. To address this, we propose a novel transfer learning-based method, termed SAResNet, which combines the self-attention mechanism and residual network structure. More specifically, the attention-driven module captures the position information of the sequence, while the residual network structure guarantees that the high-level features of the binding site can be extracted. Meanwhile, the pre-training strategy used by SAResNet improves the learning ability of the network and accelerates the convergence speed of the network during transfer learning. The performance of SAResNet is extensively tested on 690 datasets from the ChIP-seq experiments with an average AUC of 92.0%, which is 4.4% higher than that of the best state-of-the-art method currently available. When tested on smaller datasets, the predictive performance is more clearly improved. Overall, we demonstrate that the superior performance of DNA-protein binding prediction on DNA sequences can be achieved by combining the attention mechanism and residual structure, and a novel pipeline is accordingly developed. The proposed methodology is generally applicable and can be used to address any other sequence classification problems.
Funder
National Institutes of Health
Australian Research Council
National Health and Medical Research Council
Fundamental Research Funds for the Central Universities
Natural Science Foundation of Shanghai
National Natural Science Foundation of China
Publisher
Oxford University Press (OUP)
Subject
Molecular Biology,Information Systems
Cited by
34 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献