SVM Model Against Telecom Card Fraud Using GA Optimised Ten-Fold Cross-Testing

Author:

Wei Peng

Abstract

The increased number of payment methods also makes it easier for personal information to be stolen by criminals, and for criminals to take over financial payment accounts and steal money. With trillions of bank card transactions occurring every day, Credit Card Fraud Detection (CCFD) is a serious challenge, so this paper predicts "whether or not fraud occurs" by using six types of machine learning models. For problem 1, firstly, "mean, maximum, minimum, median, variance, standard deviation, quartile" are calculated for each indicator; secondly, data cleaning is carried out, and the data set is found to be free of missing values and outliers. Then the data preprocessing work was carried out, min_max normalisation and z-score standardisation were performed on the data. After that, correlation analysis was carried out, and the first four indicators were classified as negative indicators and the last three as positive indicators according to the characteristics of the indicators themselves. It can be found by calculating the Pearson correlation coefficient value after two data processing. Using the coefficient of variation method to calculate the weight of the seven "influence whether fraud" indicators. Finally, BP neural network model, decision tree model, random forest classification model, ELM model, SVM model, logistic regression model are established. For Problem 2, the four models constructed in Problem 1 are solved; to solve the BP neural network model: the data set is divided into training set and testing set according to the ratio of 6:4, and the sigmod function is used as the activation function. For BP neural network, "output >0.5" is recorded as 1, i.e. fraudulent behaviour; "output <0.5" is recorded as 0, i.e. non-fraudulent behaviour. Adjusting the learning rate and the number of iterations, the optimal average mean square error after optimal gradient descent is smaller. To solve the SVM model, the data set is divided into ten groups using the improved ten-fold cross-test, with one group as the training set and nine groups as the validation set, so as to obtain the model with the highest accuracy and the corresponding training data, and then the genetic algorithm is used to search for the optimisation of the kernel parameters in the SVM model on this basis. To solve the decision tree model, the training set and prediction set are divided into 7:3 and solved, and the number of leaf nodes is optimised. Solve the random forest classification model, divided into training set and prediction set according to 7:3 and solved, for similar accuracy choose the random forest classifier when the decision tree is less.

Publisher

Darcy & Roy Press Co. Ltd.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3