Affiliation:
1. School of Chemistry and Chemical Engineering , Qinghai Minzu University , Xining 810007 , P.R. China
2. School of Chemistry and Chemical Engineering , Qinghai Normal University , Xining 810016 , P.R. China
Abstract
Abstract
Gene selection is one of the key steps for gene expression data analysis. An SVM-based ensemble feature selection method is proposed in this paper. Firstly, the method builds many subsets by using Monte Carlo sampling. Secondly, ranking all the features on each of the subsets and integrating them to obtain a final ranking list. Finally, the optimum feature set is determined by a backward feature elimination strategy. This method is applied to the analysis of 4 public datasets: the Leukemia, Prostate, Colorectal, and SMK_CAN, resulting 7, 10, 13, and 32 features. The AUC obtained from independent test sets are 0.9867, 0.9796, 0.9571, and 0.9575, respectively. These results indicate that the features selected by the proposed method can improve sample classification accuracy, and thus be effective for gene selection from gene expression data.
Funder
Qinghai Provincial Natural Science Fund
Subject
Computational Mathematics,Genetics,Molecular Biology,Statistics and Probability