Machine Learning Models and Data-Balancing Techniques for Credit Scoring: What Is the Best Combination?-Reference-Cited by-同舟云学术

Machine Learning Models and Data-Balancing Techniques for Credit Scoring: What Is the Best Combination?

Published:2022-08-24 Issue:9 Volume:10 Page:169
ISSN:2227-9091
Container-title:Risks
language:en
Short-container-title:Risks

Author:

Hussin Adam Khatir Ahmed Almustfa,Bee Marco^ORCID

Abstract

Forecasting the creditworthiness of customers is a central issue of banking activity. This task requires the analysis of large datasets with many variables, for which machine learning algorithms and feature selection techniques are a crucial tool. Moreover, the percentages of “good” and “bad” customers are typically imbalanced such that over- and undersampling techniques should be employed. In the literature, most investigations tackle these three issues individually. Since there is little evidence about their joint performance, in this paper, we try to fill this gap. We use five machine learning classifiers, and each of them is combined with different feature selection techniques and various data-balancing approaches. According to the empirical analysis of a retail credit bank dataset, we find that the best combination is given by random forests, random forest recursive feature elimination and random oversampling.

Publisher

MDPI AG

Subject

Strategy and Management,Economics, Econometrics and Finance (miscellaneous),Accounting

Link

https://www.mdpi.com/2227-9091/10/9/169/pdf

Reference51 articles.

1. Feature selection method using improved CHI Square on Arabic text classifiers: analysis and application

2. The Credit Scoring Toolkit—Theory and Practice for Retail Credit Risk Management and Decision Automation;Anderson,2007

3. Benchmarking state-of-the-art classification algorithms for credit scoring

4. A study of the behavior of several methods for balancing machine learning training data

5. Credit-Risk Modelling: Theoretical Foundations, Diagnostic Tools, Practical Examples, and Numerical Recipes in Python;Bolder,2018

Cited by 14 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Striking a Balance: Evaluating Credit Risk with Traditional and Machine Learning Models;Bulletin of Business and Economics (BBE);2024-08-28

2. Methodology for Smooth Transition from Experience-Based to Data-Driven Credit Risk Assessment Modeling under Data Scarcity;Mathematics;2024-08-02

3. Credit Scoring Prediction Using Boruta Feature Selection with Different Sampling Techniques;2024 International Conference on Science, Engineering and Business for Driving Sustainable Development Goals (SEB4SDG);2024-04-02

4. Discrete-Time Survival Models with Neural Networks for Age–Period–Cohort Analysis of Credit Risk;Risks;2024-02-03

5. Deep Learning and Machine Learning Techniques for Credit Scoring: A Review;Communications in Computer and Information Science;2024