Improvements in PD models. A case-study approach

Author:

Caplescu Raluca Dana1,Cojocea Manuela-Simona2,Pele Daniel Traian1,Strat Vasile Alecsandru1

Affiliation:

1. Bucharest University of Economic Studies , Bucharest , Romania

2. University of Bucharest , Bucharest , Romania

Abstract

Abstract Models for estimating the probability of default are widely used in the business throughout the lending process, starting as early as the application stage, where they play an important role in loan approval status. For model soundness and performance, ensuring adequate data quality is essential. Identifying outliers, analyzing their impact and choosing the right method to treat them is a necessary stage of preprocessing, which is often overlooked in practice for a variety of reasons, an important one being insufficient data. Given the inherent imbalance of the loan portfolio with regard to default status, elimination of outliers is seldom feasible. The current widely accepted approach is based on binning and weight of evidence. Usually two types of binning are tested, namely bucket and quantile. While the latter is robust to outlier presence, the former is not. Both approaches lead to the discretization of the continuous variable they are applied on. This causes information loss both in terms of variation given by individual values and in terms of distance between the various observation points on a certain variable. In the present paper, we explore the opportunity of using other methods for dealing with outlier presence and we describe their advantages and disadvantages in the context of probability of default estimation for credit risk. We conclude that, aside from quantile binning, not dealing with outliers in case of very large datasets or winsorizing are also effective. More importantly, several methods should be considered and tested for each variable in order to find the optimal balance between altering the data and reducing variance.

Publisher

Walter de Gruyter GmbH

Subject

General Earth and Planetary Sciences,General Environmental Science

Reference26 articles.

1. Abdemoula, A. K. (2015). Bank Credit Risk Analysis with K-Nearest-Neighbor Classifier: Case of Tunisian Banks. Accounting and Management Information Systems, 14(1), 79-106.

2. Akcura, K., & Chhibber, A. (2018). Design and Comparison of Data Mining Techniques for Predicting Probability of Default on a Loan. Retrieved from https://www.researchgate.net/profile/Korhan_Akcura2/publication/335173357_Design_and_Comparison_of_Data_Mining_Techniques_for_Predicting_Probability_of_Default_on_a_Loan/links/5d54a79d92851c93b630b8be/Design-and-Comparison-of-Data-Mining-Techniques-for-Pr.

3. Bayraci, S., & Susuz, O. (2019). A Deep Neural Network (DNN) based Classification Model in Application to Loan Default Prediction. Theoretical and Applied Economics, XXV(4(621)), 75-84.

4. Chawla, N., Bowyer, K., Hall, L., & Kegelmeyer, W. (2002). Smote: synthetic minority over- sampling technique. Journal Of Artificial Intelligence Research, 16, 321-357. doi:DOI: 10.1613/jair.953.10.1613/jair.953

5. Çığşar, B., & Ünal, D. (2019). Comparison of Data Mining Classification Algorithms Determining the Default Risk. Scientific Programming, 1-8.10.1155/2019/8706505

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3