Author:
Qin Xiwen,Zhang Shuang,Yin Dongmei,Chen Dongxue,Dong Xiaogang
Abstract
<abstract><p>Microarray technology has developed rapidly in recent years, producing a large number of ultra-high dimensional gene expression data. However, due to the huge sample size and dimension proportion of gene expression data, it is very challenging work to screen important genes from gene expression data. For small samples of high-dimensional biomedical data, this paper proposes a two-stage feature selection framework combining Wrapper, embedding and filtering to avoid the curse of dimensionality. The proposed framework uses weighted gene co-expression network (WGCNA), random forest and minimal redundancy maximal relevance (mRMR) for first stage feature selection. In the second stage, a new gene selection method based on the improved binary Salp Swarm Algorithm is proposed, which combines machine learning methods to adaptively select feature subsets suitable for classification algorithms. Finally, the classification accuracy is evaluated using six methods: lightGBM, RF, SVM, XGBoost, MLP and KNN. To verify the performance of the framework and the effectiveness of the proposed algorithm, the number of genes selected and the classification accuracy was compared with the other five intelligent optimization algorithms. The results show that the proposed framework achieves an accuracy equal to or higher than other advanced intelligent algorithms on 10 datasets, and achieves an accuracy of over 97.6% on all 10 datasets. This shows that the method proposed in this paper can solve the feature selection problem related to high-dimensional data, and the proposed framework has no data set limitation, and it can be applied to other fields involving feature selection.</p></abstract>
Publisher
American Institute of Mathematical Sciences (AIMS)
Subject
Applied Mathematics,Computational Mathematics,General Agricultural and Biological Sciences,Modeling and Simulation,General Medicine
Reference34 articles.
1. A. Bashiri, M. Ghazisaeedi, R. Safdari, L. Shahmoradi, H. Ehtesham, Improving the prediction of survival in cancer patients by using machine learning techniques: experience of gene expression data: a narrative review, Iran. J. Public Health, 46 (2017), 165−172.
2. A. K. Shukla, P. Singh, M. Vardhan, Gene selection for cancer types classification using novel hybrid metaheuristics approach, Swarm Evol. Comput., 54 (2020), 100661. https://doi.org/10.1016/j.swevo.2020.100661
3. A. Saha, S. Das, Clustering of fuzzy data and simultaneous feature selection: a model selection approach, Fuzzy Set Syst., 340 (2018), 1−37. https://doi.org/10.1016/j.fss.2017.11.015
4. J. A. Cruz, D. S. Wishart, Applications of machine learning in cancer prediction and prognosis, Cancer Inf., 2 (2006), 59−77. https://doi.org/10.1177/117693510600200030
5. A. K. Shukla, P. Singh, M. Vardhan, A hybrid framework for optimal feature subset selection, J. Intell. Fuzzy Syst., 36 (2019), 2247−2259. https://doi.org/10.3233/JIFS-169936
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献