Affiliation:
1. Gandhi Institute for Technology, India
2. KMBB College of Engineering and Technology, India
Abstract
This chapter discusses some important issues such as pre-processing of gene expression data, curse of dimensionality, feature extraction/selection, and measuring or estimating classifier performance. Although these concepts are relatively well understood among the technical people such as statisticians, electrical engineers, and computer scientists, they are relatively new to biologists and bioinformaticians. As such, it was observed that there are still some misconceptions about the use of classification methods. For instance, in most classifier design strategies, the gene or feature selection is an integral part of the classifier, and as such, it must be a part of the cross-validation process that is used to estimate the classifier prediction performance. Simon (2003) discussed several studies that appeared in prestigious journals where this important issue is overlooked, and optimistically biased prediction performances were reported. Furthermore, the authors have also discuss important properties such as generalizability or sensitivity to overtraining, built-in feature selection, ability to report prediction strength, and transparency of different approaches to provide a quick and concise reference. The classifier design and clustering methods are relatively well established; however, the complexity of the problems rooted in the microarray technology hinders the applicability of the classification methods as diagnostic and prognostic predictors or class-discovery tools in medicine.
Reference69 articles.
1. Aber, M. M., et al. (2013). Analysis of machine learning techniques for gene selection and classification of microarray data. In Proceedings of the 6th International Conference on Information Technology. ICIT.
2. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
3. Towards a novel classification of human malignancies based on gene expression patterns
4. Is cross-validation valid for small-sample microarray classification?
5. Cawley, G. C., Talbot, N. L. C., & Girolami, M. (2007). Sparse multinomial logistic regression via Bayesian l1 regularisation. In Proceedings of NIPS. NIPS.
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献