A penalized variable selection ensemble algorithm for high-dimensional group-structured data

Author:

Li DongshengORCID,Pan Chunyan,Zhao Jing,Luo AnfeiORCID

Abstract

This paper presents a multi-algorithm fusion model (StackingGroup) based on the Stacking ensemble learning framework to address the variable selection problem in high-dimensional group structure data. The proposed algorithm takes into account the differences in data observation and training principles of different algorithms. It leverages the strengths of each model and incorporates Stacking ensemble learning with multiple group structure regularization methods. The main approach involves dividing the data set into K parts on average, using more than 10 algorithms as basic learning models, and selecting the base learner based on low correlation, strong prediction ability, and small model error. Finally, we selected the grSubset + grLasso, grLasso, and grSCAD algorithms as the base learners for the Stacking algorithm. The Lasso algorithm was used as the meta-learner to create a comprehensive algorithm called StackingGroup. This algorithm is designed to handle high-dimensional group structure data. Simulation experiments showed that the proposed method outperformed other R2, RMSE, and MAE prediction methods. Lastly, we applied the proposed algorithm to investigate the risk factors of low birth weight in infants and young children. The final results demonstrate that the proposed method achieves a mean absolute error (MAE) of 0.508 and a root mean square error (RMSE) of 0.668. The obtained values are smaller compared to those obtained from a single model, indicating that the proposed method surpasses other algorithms in terms of prediction accuracy.

Funder

The Guizhou Provincial Department of Education's Youth Growth Project Fund

the Educational Department of Guizhou under Grant

Publisher

Public Library of Science (PLoS)

Reference39 articles.

1. Regression Shrinkage and Selection via the Lasso;T. Robert;Journal of the Royal Statistical Society. Series B (Methodological),1996

2. Heuristics of Instability and Stabilization in Model Selection;Leo Breiman;The Annals of Statistics,1996

3. Regression shrinkage and selection via the lasso: a retrospective;R. Tibshirani;Journal of t-he Royal Statistical Society: Series B (Statistical Methodology),2011

4. Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties;Jianqing Fan;Journal of the American Statistical Association,2001

5. The Adaptive Lasso and Its Oracle Properties;H. Zou;Journal of the American Statistical Association,2006

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3