LGBM-CBFS: A Heuristic Feature Sampling Method Based on Tree Ensembles

Author:

Zhou Yu1ORCID,Li Hui1ORCID,Chen Mei1ORCID

Affiliation:

1. College of Computer Science and Technology, Guizhou University, Guiyang, China

Abstract

Gradient boosting decision tree (GBDT) is widely used because of its state-of-art performance in academia, industry, and data science competitions. The efficiency of the model is limited by the overwhelming training cost with the surge of data. A common solution is data reduction by sampling on training data. Current popular implementations of GBDT such as XGBoost and LightGBM both supports cut the search space by using only a random subset of features without any prior knowledge, which is ineffective and may lead the model fail to converge when sampling on a high-dimensional feature space with a small sampling rate assigned. To mitigate this problem, we proposed a heuristic sampling algorithm LGBM-CBFS, which samples features based on an available prior knowledge named “importance scores” to improve the performance and the effectiveness of GBDT. Experimental results indicate that LGBM-CBFS obtains a higher level of model accuracy than uniform sampling without introducing unacceptable time cost in the sparse high-dimensional scenarios.

Funder

National Natural Science Foundation of China

Publisher

Hindawi Limited

Subject

Computer Networks and Communications,Information Systems

Reference22 articles.

1. Greedy function approximation: A gradient boosting machine.

2. XGBoost: a scalable tree boosting system;T. Chen

3. A highly efficient gradient boosting decision tree;G. Ke

4. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations

5. Probability inequalities for sums of bounded random variables;W. Hoeding;Journal of the American Statistical Association,1963

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3