Author:
Pan Xiaoying,Sun Jun,Yu Huimin,Xue Yufeng
Abstract
AbstractGene expression profile data have high-dimensionality with a small number of samples. These data characteristics lead to a long training time and low performance in predictive model construction. To address this issue, the paper proposes a feature selection algorithm using non-dominant feature-guide search. The algorithm adopts a filtering framework based on feature sorting and search strategy to overcome the problems of long training time and poor performance. First, the feature pre-selection is completed according to the calculated feature category correlation. Second, a multi-objective optimization feature selection model is constructed. Non-dominant features are defined according to the Pareto dominance theory. Combined with the bidirectional search strategy, the Pareto dominance features under the current category maximum relevance feature are removed one by one. Finally, the optimal feature subset with maximum correlation and minimum redundancy is obtained. Experimental results on six gene expression data sets show that the algorithm is much better than Fisher score, maximum information coefficient, composition of feature relevancy, mini-batch K-means normalized mutual information feature inclusion, and max-Relevance and Min-Redundancy algorithms. Compared to feature selection method based on maximum information coefficient and approximate Markov blanket, the algorithm not only has high computational efficiency but also can obtain better classification capabilities in a smaller dimension.
Funder
key technologies research and development program
Publisher
Springer Science and Business Media LLC
Subject
Computational Mathematics,Engineering (miscellaneous),Information Systems,Artificial Intelligence
Reference28 articles.
1. Ram PK, Kuila P (2019) Feature selection from microarray data: genetic algorithm based approach[J]. J Inform Optim Sci 40(8):1599–1610
2. Lim K, Li Z, Choi KP, Wong L (2015) A quantum leap in the reproducibility, precision, and sensitivity of gene expression profile analysis even when sample size is extremely small [J]. J Bioinform Computational Biol 13(4):1550018–1550018
3. Xue Y, Xue B, Zhang M (2019) Self-adaptive particle swarm optimization for large-scale feature selection in classification[J]. ACM Trans Knowl Discov from Data (TKDD) 13(5):1–27
4. Hambali MA, Oladele TO, Adewole KS (2020) Microarray cancer feature selection: review, challenges and research directions[J]. Int J Cogn Computing Eng 1(1):78–97
5. Rui Z, Feiping N et al (2019) Feature selection with multi-view data: a survey ScienceDirect[J]. Int J Inform Fusion 50:158–167