4mCPred-CNN—Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network-Reference-Cited by-同舟云学术

4mCPred-CNN—Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network

Published:2021-02-20 Issue:2 Volume:12 Page:296
ISSN:2073-4425
Container-title:Genes
language:en
Short-container-title:Genes

Author:

Abbas Zeeshan,Tayara Hilal^ORCID,Chong Kil To^ORCID

Abstract

Among DNA modifications, N4-methylcytosine (4mC) is one of the most significant ones, and it is linked to the development of cell proliferation and gene expression. To know different its biological functions, the accurate detection of 4mC sites is required. Although we have several techniques for the prediction of 4mC sites in different genomes based on both machine learning (ML) and convolutional neural networks (CNNs), there is no CNN-based tool for the identification of 4mC sites in the mouse genome. In this article, a CNN-based model named 4mCPred-CNN was developed to classify 4mC locations in the mouse genome. Until now, we had only two ML-based models for this purpose; they utilized several feature encoding schemes, and thus still had a lot of space available to improve the prediction accuracy. Utilizing only a single feature encoding scheme—one-hot encoding—we outperformed both of the previous ML-based techniques. In a ten-fold validation test, the proposed model, 4mCPred-CNN, achieved an accuracy of 85.71% and Matthews correlation coefficient (MCC) of 0.717. On an independent dataset, the achieved accuracy was 87.50% with an MCC value of 0.750. The attained results exhibit that the proposed model can be of great use for researchers in the fields of biology and bioinformatics.

Funder

National Research Foundation of Korea

Publisher

MDPI AG

Subject

Genetics (clinical),Genetics

Link

https://www.mdpi.com/2073-4425/12/2/296/pdf

Reference39 articles.

1. Selective recognition of N 4-methylcytosine in DNA by engineered transcription-activator-like effectors

2. New concepts in DNA methylation

3. A CNN-Based RNA N6-Methyladenosine Site Predictor for Multiple Species Using Heterogeneous Features Representation