Affiliation:
1. Department of Pharmacy, The Affiliated Hospital of North Sichuan Medical College, Nanchong 637000, P. R. China
2. Department of Pharmacology, School of Pharmacy, Southwest Medical University Luzhou 646000, P. R. China
Abstract
N4-methyladenosine (4mC) methylation is an essential epigenetic modification of deoxyribonucleic acid (DNA) that plays a key role in many biological processes such as gene expression, gene replication and transcriptional regulation. Genome-wide identification and analysis of the 4mC sites can better reveal the epigenetic mechanisms that regulate various biological processes. Although some high-throughput genomic experimental methods can effectively facilitate the identification in a genome-wide scale, they are still too expensive and laborious for routine use. Computational methods can compensate for these disadvantages, but they still leave much room for performance improvement. In this study, we develop a non-NN-style deep learning-based approach for accurately predicting 4mC sites from genomic DNA sequence. We generate various informative features represented sequence fragments around 4mC sites, and subsequently implement them into a deep forest (DF) model. After training the deep model using 10-fold cross-validation, the overall accuracies of 85.0%, 90.0%, and 87.8% were achieved for three representative model organisms, A. thaliana, C. elegans, and D. melanogaster, respectively. In addition, extensive experiment results show that our proposed approach outperforms other existing state-of-the-art predictors in the 4mC identification. Our approach stands for the first DF-based algorithm for the prediction of 4mC sites, providing a novel idea in this field.
Funder
National Natural Science Foundation of China
Luzhou Science and Technology Bureau
Publisher
World Scientific Pub Co Pte Ltd
Subject
Computer Science Applications,Molecular Biology,Biochemistry