Affiliation:
1. School of Computer and Information Technology, Beijing Jiaotong University, Beijing, 100044, P. R. China
2. Department of Mathematics, Hong Kong Baptist University, Kowloon Toog, Hong Kong, P. R. China
Abstract
Microarray data profiles gene expression on a whole genome scale, and provides a good way to study associations between gene expression and occurrence or progression of cancer disease. Many researchers realized that microarray data is useful to predict cancer cases. However, the high dimension of gene expressions, which is significantly larger than the sample size, makes this task very difficult. It is very important to identify the significant genes causing cancer. Many feature selection algorithms have been proposed focusing on improving cancer predictive accuracy at the expense of ignoring the correlations between the features. In this work, a novel framework (named by SGS) is presented for significant genes selection and efficient cancer case classification. The proposed framework first performs a clustering algorithm to find the gene groups where genes in each group have higher correlation coefficient, and then selects (1) the significant (2) genes in each group using the Bayesian Lasso method and important gene groups using the group Lasso method, and finally builds a prediction model based on the shrinkage gene space with efficient classification algorithm (such as support vector machine (SVM), 1NN, and regression). Experimental results on public available microarray data show that the proposed framework often outperforms the existing feature selection and prediction methods such as SAM, information gain (IG), and Lasso-type prediction models.
Publisher
World Scientific Pub Co Pte Lt
Subject
Computer Science Applications,Information Systems
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献