Affiliation:
1. Department of Computer Science and Engineering, National Institute of Technology, Raipur, India
Abstract
Background:
Biomedical data is filled with continuous real values; these values in the
feature set tend to create problems like underfitting, the curse of dimensionality and increase in
misclassification rate because of higher variance. In response, pre-processing techniques on dataset
minimizes the side effects and have shown success in maintaining the adequate accuracy.
Aims:
Feature selection and discretization are the two necessary preprocessing steps that were effectively
employed to handle the data redundancies in the biomedical data. However, in the previous
works, the absence of unified effort by integrating feature selection and discretization together
in solving the data redundancy problem leads to the disjoint and fragmented field. This paper proposes
a novel multi-objective based dimensionality reduction framework, which incorporates both
discretization and feature reduction as an ensemble model for performing feature selection and discretization.
Selection of optimal features and the categorization of discretized and non-discretized
features from the feature subset is governed by the multi-objective genetic algorithm (NSGA-II).
The two objectives, minimizing the error rate during the feature selection and maximizing the information
gain, while discretization is considered as fitness criteria.
Methods:
The proposed model used wrapper-based feature selection algorithm to select the optimal
features and categorized these selected features into two blocks namely discretized and nondiscretized
blocks. The feature belongs to the discretized block will participate in the binary discretization
while the second block features will not be discretized and used in its original form.
Results:
For the establishment and acceptability of the proposed ensemble model, the experiment is
conducted on the fifteen medical datasets, and the metric such as accuracy, mean and standard deviation
are computed for the performance evaluation of the classifiers.
Conclusion:
After an extensive experiment conducted on the dataset, it can be said that the proposed
model improves the classification rate and outperform the base learner.
Publisher
Bentham Science Publishers Ltd.
Subject
Radiology Nuclear Medicine and imaging
Reference53 articles.
1. Le T.M.; Paul J.S.; Ong S.H.; Computational biology. Appl Bioinformatics 2010,673(1),243-271
2. Song J.; Tan H.; Perry A.J.; Akutsu T.; Webb G.I.; Whisstock J.C.; Pike R.N.; PROSPER: an integrated feature-based tool for predicting protease substrate cleavage sites. PLoS One 2012,7(11)
3. Winiarski T.; Biesiada J.; Kachel A.; Feature ranking, selection and discretization. ICANN 2003,2003,251-254
4. Houari R.; Bounceur A.; Kechadi M.; Tari A.; Euler R.; Dimensionality reduction in data mining : A Copula approach. Expert Syst Appl 2016,64,247-260
5. Horng J-T.; Wu L-C.; Liu B-J.; Kuo J-L.; Kuo W-H.; Zhang J-J.; An expert system to classify microarray gene expression data using gene selection by decision tree. Expert Syst Appl 2009,36(5),9072-9081
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献