Affiliation:
1. College of Information Engineering, Northwest A&F University , Yangling, 712100, Shaanxi , China
2. School of Software, Shandong University , Jinan, 250101, Shandong , China
Abstract
Abstract
Enhancers, a class of distal cis-regulatory elements located in the non-coding region of DNA, play a key role in gene regulation. It is difficult to identify enhancers from DNA sequence data because enhancers are freely distributed in the non-coding region, with no specific sequence features, and having a long distance with the targeted promoters. Therefore, this study presents a stacking ensemble learning method to accurately identify enhancers and classify enhancers into strong and weak enhancers. Firstly, we obtain the fusion feature matrix by fusing the four features of Kmer, PseDNC, PCPseDNC and Z-Curve9. Secondly, five K-Nearest Neighbor (KNN) models with different parameters are trained as the base model, and the Logistic Regression algorithm is utilized as the meta-model. Thirdly, the stacking ensemble learning strategy is utilized to construct a two-layer model based on the base model and meta-model to train the preprocessed feature sets. The proposed method, named iEnhancer-SKNN, is a two-layer prediction model, in which the function of the first layer is to predict whether the given DNA sequences are enhancers or non-enhancers, and the function of the second layer is to distinguish whether the predicted enhancers are strong enhancers or weak enhancers. The performance of iEnhancer-SKNN is evaluated on the independent testing dataset and the results show that the proposed method has better performance in predicting enhancers and their strength. In enhancer identification, iEnhancer-SKNN achieves an accuracy of 81.75%, an improvement of 1.35% to 8.75% compared with other predictors, and in enhancer classification, iEnhancer-SKNN achieves an accuracy of 80.50%, an improvement of 5.5% to 25.5% compared with other predictors. Moreover, we identify key transcription factor binding site motifs in the enhancer regions and further explore the biological functions of the enhancers and these key motifs. Source code and data can be downloaded from https://github.com/HaoWuLab-Bioinformatics/iEnhancer-SKNN.
Funder
National Natural Science Foundation of China
National Key Research and Development Program
Fundamental Research Funds of Shandong University
Publisher
Oxford University Press (OUP)
Subject
Genetics,Molecular Biology,Biochemistry,General Medicine
Reference61 articles.
1. Enhancer variants: evaluating functions in common disease;Corradin;Genome Med,2014
2. ENdb: a manually curated database of experimentally supported enhancers for human and mouse;Bai;Nuclc Acids Res,2020
3. Cis-regulatory mutations in human disease;Epstein;Brief Funct Genomic Proteomic,2009
4. Enhancers: five essential questions;Len;Nat Rev Genet,2013
5. Highly conserved non-coding sequences are associated with vertebrate development;Woolfe;PLoS Biol,2004
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献