CatBoost for big data: an interdisciplinary review-Reference-Cited by-同舟云学术

CatBoost for big data: an interdisciplinary review

Published:2020-11-04 Issue:1 Volume:7 Page:
ISSN:2196-1115
Container-title:Journal of Big Data
language:en
Short-container-title:J Big Data

Author:

Hancock John T.^ORCID,Khoshgoftaar Taghi M.

Abstract

Abstract Gradient Boosted Decision Trees (GBDT’s) are a powerful tool for classification and regression tasks in Big Data. Researchers should be familiar with the strengths and weaknesses of current implementations of GBDT’s in order to use them effectively and make successful contributions. CatBoost is a member of the family of GBDT machine learning ensemble techniques. Since its debut in late 2018, researchers have successfully used CatBoost for machine learning studies involving Big Data. We take this opportunity to review recent research on CatBoost as it relates to Big Data, and learn best practices from studies that cast CatBoost in a positive light, as well as studies where CatBoost does not outshine other techniques, since we can learn lessons from both types of scenarios. Furthermore, as a Decision Tree based algorithm, CatBoost is well-suited to machine learning tasks involving categorical, heterogeneous data. Recent work across multiple disciplines illustrates CatBoost’s effectiveness and shortcomings in classification and regression tasks. Another important issue we expose in literature on CatBoost is its sensitivity to hyper-parameters and the importance of hyper-parameter tuning. One contribution we make is to take an interdisciplinary approach to cover studies related to CatBoost in a single work. This provides researchers an in-depth understanding to help clarify proper application of CatBoost in solving problems. To the best of our knowledge, this is the first survey that studies all works related to CatBoost in a single publication.

Publisher

Springer Science and Business Media LLC

Subject

Information Systems and Management,Computer Networks and Communications,Hardware and Architecture,Information Systems

Link

http://link.springer.com/content/pdf/10.1186/s40537-020-00369-8.pdf

Reference104 articles.

1. Kotsiantis SB, Zaharakis I, Pintelas P. Supervised machine learning: a review of classification techniques. Emerg Artif Intellig Appl Comput Eng. 2007;160(1):3–24.

2. Liudmila P, Gleb G, Aleksandr V, Anna Veronika D, Andrey G. Catboost: unbiased boosting with categorical features. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, 2018; pages 6638–6648. Curran Associates, Inc.

3. Johnson JM, Khoshgoftaar TM. Deep learning and data sampling with imbalanced big data. In: 2019 IEEE 20th international conference on information reuse and integration for data science (IRI). 2019; p. 175–183.

4. Johnson JM, Khoshgoftaar TM. Medicare fraud detection using neural networks. J Big Data. 2019;1:1.

5. Yasunari M, Takuomi H, Anna O, Kouichi Y, Uesawa Y. Prediction model of aryl hydrocarbon receptor activation by a novel qsar approach, deepsnap-deep learning. Molecules. 2020;25(6):1317.

Cited by 326 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Recognition of human mood, alertness and comfort under the influence of indoor lighting using physiological features;Biomedical Signal Processing and Control;2024-03

2. Deep neural network with empirical mode decomposition and Bayesian optimisation for residential load forecasting;Expert Systems with Applications;2024-03

3. A machine learning-based ensemble model for estimating diurnal variations of nitrogen oxide concentrations in Taiwan;Science of The Total Environment;2024-03

4. Artificial intelligence for forecasting sales of agricultural products: A case study of a moroccan agricultural company;Journal of Open Innovation: Technology, Market, and Complexity;2024-03

5. Precision Leak Detection in Supermarket Refrigeration Systems Integrating Categorical Gradient Boosting with Advanced Thresholding;Energies;2024-02-04