LGBM-CBFS: A Heuristic Feature Sampling Method Based on Tree Ensembles-Reference-Cited by-同舟云学术

LGBM-CBFS: A Heuristic Feature Sampling Method Based on Tree Ensembles

Published:2022-03-16 Issue: Volume:2022 Page:1-8
ISSN:1939-0122
Container-title:Security and Communication Networks
language:en
Short-container-title:Security and Communication Networks

Author:

Zhou Yu¹^ORCID,Li Hui¹^ORCID,Chen Mei¹^ORCID

Affiliation:

1. College of Computer Science and Technology, Guizhou University, Guiyang, China

Abstract

Gradient boosting decision tree (GBDT) is widely used because of its state-of-art performance in academia, industry, and data science competitions. The efficiency of the model is limited by the overwhelming training cost with the surge of data. A common solution is data reduction by sampling on training data. Current popular implementations of GBDT such as XGBoost and LightGBM both supports cut the search space by using only a random subset of features without any prior knowledge, which is ineffective and may lead the model fail to converge when sampling on a high-dimensional feature space with a small sampling rate assigned. To mitigate this problem, we proposed a heuristic sampling algorithm LGBM-CBFS, which samples features based on an available prior knowledge named “importance scores” to improve the performance and the effectiveness of GBDT. Experimental results indicate that LGBM-CBFS obtains a higher level of model accuracy than uniform sampling without introducing unacceptable time cost in the sparse high-dimensional scenarios.

Funder

National Natural Science Foundation of China

Publisher

Hindawi Limited

Subject

Computer Networks and Communications,Information Systems

Link

http://downloads.hindawi.com/journals/scn/2022/5156086.pdf

Reference22 articles.

1. Greedy function approximation: A gradient boosting machine.

2. XGBoost: a scalable tree boosting system;T. Chen

3. A highly efficient gradient boosting decision tree;G. Ke

4. A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the sum of Observations

5. Probability inequalities for sums of bounded random variables;W. Hoeding;Journal of the American Statistical Association,1963

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A meta-ensemble machine learning strategy to assess groundwater holistic vulnerability in coastal aquifers;Groundwater for Sustainable Development;2024-08

2. Prediction and Screening Model for Products Based on Fusion Regression and XGBoost Classification;Computational Intelligence and Neuroscience;2022-07-31