Affiliation:
1. Department of Computer Science, Faculty of Social Science and Humanities, Muhammad Nawaz Sharif University of Agriculture, Multan, Pakistan
2. Department of Biotechnology, Institute of Plant Breeding and Biotechnology, Muhammad Nawaz Sharif University of Agriculture, Multan, Pakistan
Abstract
Background:
With the rapid development of the sequencing methods in recent years,
binding sites have been systematically identified in such projects as Nested-MICA and MEME.
Prediction of DNA motifs with higher accuracy and precision has been a very important task for
bioinformaticians. Nevertheless, experimental approaches are still time-consuming for big data set,
making computational identification of binding sites indispensable.
Objective:
To facilitate the identification of the binding site, we proposed a deep learning architecture, named Deep-BSC
(Deep-Learning Binary Search Classification), to predict binding sites in a raw DNA sequence with more precision and
accuracy.
Methods:
Our proposed architecture purely relies on the raw DNA sequence to predict the binding
sites for protein by using a convolutional neural network (CNN). We trained our deep learning
model on binding sites at the nucleotide level. DNA sequence of A. thaliana is used in this study
because it is a model plant.
Results:
The results demonstrate the effectiveness and efficiency of our method in the classification
of binding sites against random sequences, using deep learning. We construct a CNN with different
layers and filters to show the usefulness of max-pooling technique in the proposed method. To gain
the interpretability of our approach, we further visualized binding sites in the saliency map and
successfully identified similar motifs in the raw sequence. The proposed computational framework
is time and resource efficient.
Conclusion:
Deep-BSC enables the identification of binding sites in the DNA sequences via a highly accurate CNN. The
proposed computational framework can also be applied to problems such as operator, repeats in the genome, DNA
markers, and recognition sites for enzymes, thereby promoting the use of Deep-BSC method in life sciences.
Publisher
Bentham Science Publishers Ltd.
Subject
Computational Mathematics,Genetics,Molecular Biology,Biochemistry
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献