Affiliation:
1. Baotou Teacher’s College, Inner Mongolia University of Science and Technology, Baotou, China
2. Laboratory of Theoretical Biophysics, School of Physical Science and Technology, Inner Mongolia University, Hohhot, China
Abstract
Background:
The non-coding RNA identification at the organelle genome level is a
challenging task. In our previous work, an ncRNA dataset with less than 80% sequence identity
was built, and a method incorporating an increment of diversity combining with support vector
machine method was proposed.
Objective:
Based on the ncRNA_361 dataset, a novel decision-making method-an improved
KNN (iKNN) classifier was proposed.
Methods:
In this paper, based on the iKNN algorithm, the physicochemical features of nucleotides,
the degeneracy of genetic codons, and topological secondary structure were selected to represent
the effective ncRNA characters. Then, the incremental feature selection method was utilized to optimize
the feature set.
Results:
The results of iKNN indicated that the decision-making method of mean value is distinctly
superior to the traditional decision-making method of majority vote the Increment of Diversity
Combining Support Vector Machine (ID-SVM). The iKNN algorithm achieved an overall accuracy
of 97.368% in the jackknife test, when k=3.
Conclusion:
It should be noted that the triplets of the structure-sequence mode under reading
frames not only contains the entire sequence information but also reflects whether the base was
paired or not, and the secondary structural topological parameters further describe the ncRNA secondary
structure on the spatial level. The ncRNA dataset and the iKNN classifier are freely available
at http://202.207.14.87:8032/fuwu/iKNN/index.asp.
Funder
Inner Mongolia Autonomous Region
Publisher
Bentham Science Publishers Ltd.
Subject
Computational Mathematics,Genetics,Molecular Biology,Biochemistry
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献