Author:
Gao Xiang,Wen Junhao,Zhang Cheng
Abstract
Abstract
In this paper, the incremental random forest algorithm is proposed for the classification and prediction problem of dynamically increasing data. Traditional batch machine learning algorithms perform modeling at one time and cannot allow newly generated samples to participate in learning, which leads to too much model deviation. This paper combines incremental learning with random forest and proposes incremental random forest. Applying this algorithm to the problem of predicting credit card customer default behavior can help banks control risks and reduce losses. It is important to conduct card issuance audits on card issuers and early warning of risks to cardholders. The algorithm performed better in the experiment of predicting the default behavior of credit card customers based on a batch of credit card holder data of a bank in Taiwan. Compared with random forest, decision tree, logistic regression, naive bayes, BP neural network, and support vector machine, it has relatively better performance in our experiment.
Subject
General Physics and Astronomy
Reference9 articles.
1. Prediction model building with clustering-launched classification and support vector machines in credit scoring;Luo;Expert Systems with Applications,2009
2. A comparative assessment of ensemble learning for credit scoring;Wang;Expert Systems with Applications,2011
3. Combining classifiers for credit risk prediction;Twala;Journal of Systems Science and Systems Engineering,2009
4. Using data mining to improve assessment of credit worthiness via credit scoring models;Wah;Expert Systems with Applications,2011
5. Transfer learning-based default prediction model for consumer credit in China;Ding;The Journal of Supercomputing,2001