Affiliation:
1. School of Computer and Information Engineering, Henan University, Kaifeng, China
2. China Mobile Online Service Co. Ltd., China
3. Academy of Arts & Design, Tsinghua University, Beijing, China
4. College of Computer Science
and Technology, Henan Polytechnic University, Jiaozuo, China
Abstract
Background:
The massive amount of biomedical data accumulated in the past decades can
be utilized for diagnosing disease.
Objective:
However, the high dimensionality, small sample sizes, and irrelevant features of data often have
a negative influence on the accuracy and speed of disease prediction. Some existing machine learning
models cannot capture the patterns on these datasets accurately without utilizing feature selection.
Methods:
Filter and wrapper are two prevailing feature selection methods. The filter method is fast but
has low prediction accuracy, while the latter can obtain high accuracy but has a formidable computation
cost. Given the drawbacks of using filter or wrapper individually, a novel feature selection method,
called MRMR-EFPATS, is proposed, which hybridizes filter method Minimum Redundancy Maximum
Relevance (MRMR) and wrapper method based on an improved Flower Pollination Algorithm (FPA).
First, MRMR is employed to rank and screen out some important features quickly. These features are
further chosen for individual populations following the wrapper method for faster convergence and less
computational time. Then, due to its efficiency and flexibility, FPA is adopted to further discover an optimal
feature subset.
Result:
FPA still has some drawbacks, such as slow convergence rate, inadequacy in terms of searching
new solutions, and tends to be trapped in local optima. In our work, an elite strategy is adopted to
improve the convergence speed of the FPA. Tabu search and Adaptive Gaussian Mutation are employed
to improve the search capability of FPA and escape from local optima. Here, the KNN classifier with
the 5-fold-CV is utilized to evaluate the classification accuracy.
Conclusion:
Extensive experimental results on six public high dimensional biomedical datasets show
that the proposed MRMR-EFPATS has achieved superior performance compared to other state-of-theart
methods.
Publisher
Bentham Science Publishers Ltd.
Subject
Computational Mathematics,Genetics,Molecular Biology,Biochemistry
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献