Affiliation:
1. Christ Deemed To Be University
Abstract
Abstract
Insurance fraud is a growing concern, prompting proactive measures through advanced machine learning techniques. This research focuses on constructing a predictive model for distinguishing genuine and fraudulent auto insurance claims. The dataset, comprising 1,000 instances and 40 attributes, covers customer demographics, policy details, incidents, and financial data. Early fraud detection is crucial for financial loss mitigation and maintaining insurance system integrity. The study employs data preprocessing to handle missing values and features XGBoost importance, variance thresholding, and correlation analysis for enhanced model interpretability. The machine learning model integrates nine algorithms, with a hard-voting ensemble of Logistic Regression and XGBoost demonstrating competitive accuracy, reaching 83.0%. Results highlight Linear Discriminant Analysis as the leading classifier, achieving 84% accuracy. The ensemble approach achieves 83.0% accuracy with a notable precision of 91%, showcasing the strength of combining diverse models. The study emphasizes the significance of preprocessing, feature selection, and ensemble learning for fraud detection optimization. The refined model achieves a minimal Brier loss of 0.00054, indicating minimal discrepancies in predicted probabilities and actual outcomes in binary classification. Exploration of principal component analysis (PCA) with multiple linear regression reveals a trade-off between model simplicity and performance. Retaining 32 components preserves 95% of variance, achieving a balance at 0.7967, while keeping 35 components reaches the highest value of 0.9991, showcasing dimensionality reduction's potential to capture nearly all the data variance.
Publisher
Research Square Platform LLC