A penalized variable selection ensemble algorithm for high-dimensional group-structured data-Reference-Cited by-同舟云学术

A penalized variable selection ensemble algorithm for high-dimensional group-structured data

Published:2024-02-05 Issue:2 Volume:19 Page:e0296748
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Li Dongsheng^ORCID,Pan Chunyan,Zhao Jing,Luo Anfei^ORCID

Abstract

This paper presents a multi-algorithm fusion model (StackingGroup) based on the Stacking ensemble learning framework to address the variable selection problem in high-dimensional group structure data. The proposed algorithm takes into account the differences in data observation and training principles of different algorithms. It leverages the strengths of each model and incorporates Stacking ensemble learning with multiple group structure regularization methods. The main approach involves dividing the data set into K parts on average, using more than 10 algorithms as basic learning models, and selecting the base learner based on low correlation, strong prediction ability, and small model error. Finally, we selected the grSubset + grLasso, grLasso, and grSCAD algorithms as the base learners for the Stacking algorithm. The Lasso algorithm was used as the meta-learner to create a comprehensive algorithm called StackingGroup. This algorithm is designed to handle high-dimensional group structure data. Simulation experiments showed that the proposed method outperformed other R2, RMSE, and MAE prediction methods. Lastly, we applied the proposed algorithm to investigate the risk factors of low birth weight in infants and young children. The final results demonstrate that the proposed method achieves a mean absolute error (MAE) of 0.508 and a root mean square error (RMSE) of 0.668. The obtained values are smaller compared to those obtained from a single model, indicating that the proposed method surpasses other algorithms in terms of prediction accuracy.

Funder

The Guizhou Provincial Department of Education's Youth Growth Project Fund

the Educational Department of Guizhou under Grant

Publisher

Public Library of Science (PLoS)

Reference39 articles.

1. Regression Shrinkage and Selection via the Lasso;T. Robert;Journal of the Royal Statistical Society. Series B (Methodological),1996

2. Heuristics of Instability and Stabilization in Model Selection;Leo Breiman;The Annals of Statistics,1996

3. Regression shrinkage and selection via the lasso: a retrospective;R. Tibshirani;Journal of t-he Royal Statistical Society: Series B (Statistical Methodology),2011

4. Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties;Jianqing Fan;Journal of the American Statistical Association,2001

5. The Adaptive Lasso and Its Oracle Properties;H. Zou;Journal of the American Statistical Association,2006