An Information Gain-based Method for Evaluating the Classification Power of Features Towards Identifying Enhancers-Reference-Cited by-同舟云学术

An Information Gain-based Method for Evaluating the Classification Power of Features Towards Identifying Enhancers

Published:2020-11-11 Issue:6 Volume:15 Page:574-580
ISSN:1574-8936
Container-title:Current Bioinformatics
language:en
Short-container-title:CBIO

Author:

Zhang Tianjiao¹^ORCID,Wang Rongjie¹^ORCID,Jiang Qinghua²^ORCID,Wang Yadong¹^ORCID

Affiliation:

1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China

2. School of Life Science and Technology, Harbin Institute of Technology, Harbin, China

Abstract

Background: Enhancers are cis-regulatory elements that enhance gene expression on DNA sequences. Since most of enhancers are located far from transcription start sites, it is difficult to identify them. As other regulatory elements, the regions around enhancers contain a variety of features, which can help in enhancer recognition. Objective: The classification power of features differs significantly, the performances of existing methods that use one or a few features for identifying enhancer vary greatly. Therefore, evaluating the classification power of each feature can improve the predictive performance of enhancers. Methods: We present an evaluation method based on Information Gain (IG) that captures the entropy change of enhancer recognition according to features. To validate the performance of our method, experiments using the Single Feature Prediction Accuracy (SFPA) were conducted on each feature. Results: The average IG values of the sequence feature, transcriptional feature and epigenetic feature are 0.068, 0.213, and 0.299, respectively. Through SFPA, the average AUC values of the sequence feature, transcriptional feature and epigenetic feature are 0.534, 0.605, and 0.647, respectively. The verification results are consistent with our evaluation results. Conclusion: This IG-based method can effectively evaluate the classification power of features for identifying enhancers. Compared with sequence features, epigenetic features are more effective for recognizing enhancers.

Publisher

Bentham Science Publishers Ltd.

Subject

Computational Mathematics,Genetics,Molecular Biology,Biochemistry

Reference26 articles.

1. Corradin O.; Scacheri P.C.; Enhancer variants: evaluating functions in common disease. Genome Med 2014,6(10),85

2. Li W.; Notani D.; Rosenfeld M.G.; Enhancers as non-coding RNA transcription units: recent insights and future perspectives. Nat Rev Genet 2016,17(4),207-223

3. Hatzis P.; Talianidis I.; Dynamics of enhancer-promoter communication during differentiation-induced gene activation. Mol Cell 2002,10(6),1467-1477

4. Cheng L.; Hu Y.; Human Disease System Biology. Curr Gene Ther 2018,18(5),255-256

5. Lam M.T.Y.; Li W.; Rosenfeld M.G.; Glass C.K.; Enhancer RNAs and regulated transcriptional programs. Trends Biochem Sci 2014,39(4),170-182

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. iEnhancer-MRBF: Identifying enhancers and their strength with a multiple Laplacian-regularized radial basis function network;Methods;2022-12

2. Identifying and Classifying Enhancers by Dinucleotide-Based Auto-Cross Covariance and Attention-Based Bi-LSTM;Computational and Mathematical Methods in Medicine;2022-04-05

3. iEnhancer-EBLSTM: Identifying Enhancers and Strengths by Ensembles of Bidirectional Long Short-Term Memory;Frontiers in Genetics;2021-03-23

4. Prevention and Control of Pathogens Based on Big-Data Mining and Visualization Analysis;Frontiers in Molecular Biosciences;2021-02-25

5. Identification of cyclin protein using gradient boost decision tree algorithm;Computational and Structural Biotechnology Journal;2021