Affiliation:
1. Fundamental and Applied Mathematics Laboratory, Department of Mathematics and Computer Sciences, Faculty of Sciences Ain Chock, Hassan II University of Casablanca, Casablanca, Morocco
Abstract
Microarray expression datasets generate a huge number of genes, but only a few genes provide information about cancer diseases. In this context, feature selection approaches have been developed to deal with this problem. Filter-based methods, in particular, select the relevant genes and remove the irrelevant ones using different evaluation metrics. In this study, we shed light on nine univariate filter methods. Three categories of filter methods were investigated using eight microarray datasets, including binary and multi-class samples. The support vector machine and Naive Bayes classifiers were used to assess classification accuracy. Different comparison methods were used to assist the researchers in visualizing the performance of each studied filter. Precisely, statistical tests were applied in terms of classification accuracy, and the feature ranking similarity of the filter methods was studied based on a rank correlation measure.