iEnhancer-SKNN: a stacking ensemble learning-based method for enhancer identification and classification using sequence information-Reference-Cited by-同舟云学术

iEnhancer-SKNN: a stacking ensemble learning-based method for enhancer identification and classification using sequence information

Published:2023-01-28 Issue:3 Volume:22 Page:302-311
ISSN:2041-2649
Container-title:Briefings in Functional Genomics
language:en
Short-container-title:

Author:

Wu Hao¹²^ORCID,Liu Mengdi¹^ORCID,Zhang Pengyu¹^ORCID,Zhang Hongming¹^ORCID

Affiliation:

1. College of Information Engineering, Northwest A&F University , Yangling, 712100, Shaanxi , China

2. School of Software, Shandong University , Jinan, 250101, Shandong , China

Abstract

Abstract Enhancers, a class of distal cis-regulatory elements located in the non-coding region of DNA, play a key role in gene regulation. It is difficult to identify enhancers from DNA sequence data because enhancers are freely distributed in the non-coding region, with no specific sequence features, and having a long distance with the targeted promoters. Therefore, this study presents a stacking ensemble learning method to accurately identify enhancers and classify enhancers into strong and weak enhancers. Firstly, we obtain the fusion feature matrix by fusing the four features of Kmer, PseDNC, PCPseDNC and Z-Curve9. Secondly, five K-Nearest Neighbor (KNN) models with different parameters are trained as the base model, and the Logistic Regression algorithm is utilized as the meta-model. Thirdly, the stacking ensemble learning strategy is utilized to construct a two-layer model based on the base model and meta-model to train the preprocessed feature sets. The proposed method, named iEnhancer-SKNN, is a two-layer prediction model, in which the function of the first layer is to predict whether the given DNA sequences are enhancers or non-enhancers, and the function of the second layer is to distinguish whether the predicted enhancers are strong enhancers or weak enhancers. The performance of iEnhancer-SKNN is evaluated on the independent testing dataset and the results show that the proposed method has better performance in predicting enhancers and their strength. In enhancer identification, iEnhancer-SKNN achieves an accuracy of 81.75%, an improvement of 1.35% to 8.75% compared with other predictors, and in enhancer classification, iEnhancer-SKNN achieves an accuracy of 80.50%, an improvement of 5.5% to 25.5% compared with other predictors. Moreover, we identify key transcription factor binding site motifs in the enhancer regions and further explore the biological functions of the enhancers and these key motifs. Source code and data can be downloaded from https://github.com/HaoWuLab-Bioinformatics/iEnhancer-SKNN.

Funder

National Natural Science Foundation of China

National Key Research and Development Program

Fundamental Research Funds of Shandong University

Publisher

Oxford University Press (OUP)

Subject

Genetics,Molecular Biology,Biochemistry,General Medicine

Link

https://academic.oup.com/bfg/article-pdf/22/3/302/50383829/elac057.pdf

Reference61 articles.

1. Enhancer variants: evaluating functions in common disease;Corradin;Genome Med,2014