The impact of Bayesian optimization on feature selection-Reference-Cited by-同舟云学术

The impact of Bayesian optimization on feature selection

Published:2024-02-17 Issue:1 Volume:14 Page:
ISSN:2045-2322
Container-title:Scientific Reports
language:en
Short-container-title:Sci Rep

Author:

Yang Kaixin,Liu Long,Wen Yalu

Abstract

AbstractFeature selection is an indispensable step for the analysis of high-dimensional molecular data. Despite its importance, consensus is lacking on how to choose the most appropriate feature selection methods, especially when the performance of the feature selection methods itself depends on hyper-parameters. Bayesian optimization has demonstrated its advantages in automatically configuring the settings of hyper-parameters for various models. However, it remains unclear whether Bayesian optimization can benefit feature selection methods. In this research, we conducted extensive simulation studies to compare the performance of various feature selection methods, with a particular focus on the impact of Bayesian optimization on those where hyper-parameters tuning is needed. We further utilized the gene expression data obtained from the Alzheimer's Disease Neuroimaging Initiative to predict various brain imaging-related phenotypes, where various feature selection methods were employed to mine the data. We found through simulation studies that feature selection methods with hyper-parameters tuned using Bayesian optimization often yield better recall rates, and the analysis of transcriptomic data further revealed that Bayesian optimization-guided feature selection can improve the accuracy of disease risk prediction models. In conclusion, Bayesian optimization can facilitate feature selection methods when hyper-parameter tuning is needed and has the potential to substantially benefit downstream tasks.

Funder

the National Natural Science Foundation of China

Early Career Research Excellence Award from the University of Auckland

the Marsden Fund from Royal Society of New Zealand

Publisher

Springer Science and Business Media LLC

Link

https://www.nature.com/articles/s41598-024-54515-w.pdf

Reference58 articles.

1. Shan, N. et al. A novel transcriptional risk score for risk prediction of complex human diseases. Genet. Epidemiol. 45(8), 811–820. https://doi.org/10.1002/gepi.22424 (2021).

2. Pudjihartono, N., Fadason, T., Kempa-Liehr, A. W. & O’Sullivan, J. M. A review of feature selection methods for machine learning-based disease risk prediction. Front. Bioinform. 2, 927312. https://doi.org/10.3389/fbinf.2022.927312 (2022).

3. Liu, L. et al. Explainable deep transfer learning model for disease risk prediction using high-dimensional genomic data. PLoS Comput. Biol. 18(7), e1010328. https://doi.org/10.1371/journal.pcbi.1010328 (2022).

4. Ang, J. C., Mirzal, A., Haron, H. & Hamed, H. N. Supervised, unsupervised, and semi-supervised feature selection: A review on gene selection. IEEE/ACM Trans. Comput. Biol. Bioinf. 13(5), 971–989. https://doi.org/10.1109/TCBB.2015.2478454 (2015).

5. Fan, J. & Lv, J. Sure independence screening for ultra-high dimensional feature space. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 70(5), 849–911. https://doi.org/10.1111/j.1467-9868.2008.00674.x (2008).

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Methodology for Forecasting Demands in a Water Distribution Network Based on the Classical and Neural Networks Approach;The 3rd International Joint Conference on Water Distribution Systems Analysis & Computing and Control for the Water Industry (WDSA/CCWI 2024);2024-09-02

2. ONE3A: one-against-all authentication model for smartphone using GAN network and optimization techniques;PeerJ Computer Science;2024-04-29