Author:
Han Wenfang,Gu Xiao,Jian Ling
Abstract
Credit risk assessment plays a key role in determining the banking policies and commercial strategies of financial institutions. Ensemble learning approaches have been validated to be more competitive than individual classifiers and statistical techniques for default prediction. However, most researches focused on improving overall prediction accuracy rather than improving the identification of actual defaulted loans. In addition, model interpretability has not been paid enough attention in previous studies. To fill up these gaps, we propose a Multi-layer Multi-view Stacking Integration (MLMVS) approach to predict default risk in the P2P lending scenario. As the main innovation, our proposal explores multi-view learning and soft probability outputs to produce multi-layer integration based on stacking. An interpretable artificial intelligence tool LIME is embedded for interpreting the prediction results. We perform a comprehensive analysis of MLMVS on the Lending Club dataset and conduct comparative experiments to compare it with a number of well-known individual classifiers and ensemble classification methods, which demonstrate the superiority of MLMVS.
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Theoretical Computer Science
Reference54 articles.
1. A.S. Chernobai, S.T. Rachev and F.J. Fabozzi, Operational risk: a guide to Basel II capital requirements, models, and analysis, Vol. 180, John Wiley & Sons, 2008.
2. A comparative study of online P2P lending in the USA and China;Chen;Journal of Internet Banking and Commerce,2012
3. Statistical classification methods in consumer credit scoring: a review;Hand;Journal of the Royal Statistical Society: Series A (Statistics in Society),1997
4. A. Namvar, M. Siami, F. Rabhi and M. Naderpour, Credit risk prediction in an imbalanced social lending environment, International Journal of Computational Intelligence Systems (2018).
5. 2, 1 norm regularized multi-kernel based joint nonlinear feature selection and over-sampling for imbalanced data classification;Cao;Neurocomputing,2017