Affiliation:
1. Middle East Technical University
Abstract
AbstractBanks utilize credit scoring as an important indicator for the financial strength and the eligibility for credit. Scoring models aim to assign statistical odds or probabilities for predicting if there is a risk of nonpayment in relation to many other factors involved. This paper aims to illustrate the beneficial use of the ten machine learning methods (Support Vector Machine, Gaussian Naïve Bayes, Decision Trees, Random Forest, XGBoost, K-Nearest Neighbors, Multi-layer Perceptron Neural Networks, CatBoost, Light Gradient Boosting Machine, and Logistic Regression) in finding the default risk as well as the features contributing to it. An extensive comparison is made in three aspects: (i) which ML models with and without its own wrapper feature selection performs the best; (ii) how appropriate data scaling influences the performance and computational costs; (iii) which of the most successful combination (machine learning method, feature selection and scaling) delivers the best validation indicators such as accuracy rate, sensitivity, and specificity ratio. An open-access credit scoring default risk data sets on German and Australian cases are taken into account for which we determine the best method, scaling and features contributing to default risk. We also illustrate the positive contribution of the feature selection method and scaling on the performance indicators, and a ranking system.
Publisher
Research Square Platform LLC