Author:
Ye Xiaoyan,Xu Wenchao,Ye Xiaoying,Long Dan,Yin Qiuyang,Huang Binhua
Abstract
Abstract
Early detection of the severe disease -- stroke is a key step toward effective treatment. Stroke disease data is imbalanced and normally contains the majority of negative cases (without stroke) and the minority of positive cases (stroke). Previous work has used SMOTE to deal with imbalanced data, but most researchers have implemented it for the entire dataset, which means the “answer” was silently “be told” and saved in the entire data, causing data leakage. Moreover, the previous work uses accuracy only as the metrics make the result less guaranteed. We propose a method using the SMOTE applied to the training set only and apply 13 machine learning classifiers for predicting stroke. We combine the AUC with accuracy as evaluation metrics in the stroke prediction task, which can elevate the confidence level of the assessment of results. The experiment shows that misused SMOTE and standardization can cause data leakage and the combined metrics can evaluate models with higher trustworthiness. We conclude that using our method can avoid data leakage and assess the model with higher trustworthiness.
Subject
Computer Science Applications,History,Education
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献