Author:
Tang Mingzhi,Zeng Wenhao,Zhao Runzhou
Abstract
In recent years, leveraging financial big data and machine learning to identify corporate risks has emerged as a crucial approach for financial risk management. This paper proposes a method based on financial big data and the LightGBM model to effectively assess corporate credit risk ratings. Feature engineering is performed on corporate financial datasets, using correlation coefficients, chi-square tests, and machine learning techniques to select essential financial indicators. Subsequently, bayesian optimization is employed for hyperparameter tuning, using the classification accuracy of high risk and highest risk categories as the objective function. This process yields a multi-classification model capable of effectively identifying corporate credit risk ratings through financial data. The results demonstrate that the model exhibits strong identification capabilities for high credit risk corporates. The model achieves the best classification performance for high-risk categories, with an accuracy of 74%. The comprehensive classification accuracy and recall rate for both high-risk and highest-risk categories reach 70%. The overall classification accuracy across all categories is approximately 64%. In summary, through judicious model selection, data preprocessing, feature selection, Bayesian parameter tuning, and the establishment of appropriate objective functions, the LightGBM model demonstrates robust performance in addressing corporate credit risk rating problems.