Affiliation:
1. School of Computer Science, Hefei Normal University, Hefei 230001, China
2. Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
3. Key Laboratory of Crop Biology of Anhui Province, Anhui Agricultural University, Hefei 230036, China
Abstract
Orphan genes (OGs) may evolve from noncoding sequences or be derived from older coding material. Some shares of OGs are present in all sequenced genomes, participating in the biochemical and physiological pathways of many species, while many of them may be associated with the response to environmental stresses and species-specific traits or regulatory patterns. However, identifying OGs is a laborious and time-consuming task. This paper presents an automated predictor, XGBoost-A2OGs (identification of OGs for angiosperm based on XGBoost), used to identify OGs for seven angiosperm species based on hybrid features and XGBoost. The precision and accuracy of the proposed model based on fivefold cross-validation and independent testing reached 0.90 and 0.91, respectively, outperforming other classifiers in cross-species validation via other models, namely, Random Forest, AdaBoost, GBDT, and SVM. Furthermore, by analyzing and subdividing the hybrid features into five sets, it was proven that different hybrid feature sets influenced the prediction performance of OGs involving eudicot and monocot groups. Finally, testing of small-scale empirical datasets of each species separately based on optimal hybrid features revealed that the proposed model performed better for eudicot groups than for monocot groups.
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Hardware and Architecture,Signal Processing,Control and Systems Engineering
Reference35 articles.
1. Polyploidy in the Arabidopsis genus;Bomblies;Chromosome Res. Int. J. Mol. Supramol. Evol. Asp. Chromosome Biol.,2014
2. Orphans as taxonomically restricted and ecologically important genes;Wilson;Microbiology,2005
3. Donoghue, M.T.A., Keshavaiah, C., Swamidatta, S.H., and Spillane, C. (2011). Evolutionary origins of Brassicaceae specific genes in Arabidopsis thaliana. BMC Evol. Biol., 11.
4. Identification and characterization of lineage-specific genes in Populus trichocarpa;Lin;Plant Cell Tissue Organ Cult.,2013
5. Xu, Y., Wu, G., Hao, B., Chen, L., Deng, X., and Xu, Q. (2015). Identification, characterization and expression analysis of lineage-specific genes within sweet orange (Citrus sinensis). BMC Genom., 16.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献