Abstract
Abstract
Background
Diabetes Mellitus (DM) has become the third chronic non-communicable disease that hits patients after tumors, cardiovascular and cerebrovascular diseases, and has become one of the major public health problems in the world. Therefore, it is of great importance to identify individuals at high risk for DM in order to establish prevention strategies for DM.
Methods
Aiming at the problem of high-dimensional feature space and high feature redundancy of medical data, as well as the problem of data imbalance often faced. This study explored different supervised classifiers, combined with SVM-SMOTE and two feature dimensionality reduction methods (Logistic stepwise regression and LAASO) to classify the diabetes survey sample data with unbalanced categories and complex related factors. Analysis and discussion of the classification results of 4 supervised classifiers based on 4 data processing methods. Five indicators including Accuracy, Precision, Recall, F1-Score and AUC are selected as the key indicators to evaluate the performance of the classification model.
Results
According to the result, Random Forest Classifier combining SVM-SMOTE resampling technology and LASSO feature screening method (Accuracy = 0.890, Precision = 0.869, Recall = 0.919, F1-Score = 0.893, AUC = 0.948) proved the best way to tell those at high risk of DM. Besides, the combined algorithm helps enhance the classification performance for prediction of high-risk people of DM. Also, age, region, heart rate, hypertension, hyperlipidemia and BMI are the top six most critical characteristic variables affecting diabetes.
Conclusions
The Random Forest Classifier combining with SVM-SMOTE and LASSO feature reduction method perform best in identifying high-risk people of DM from individuals. And the combined method proposed in the study would be a good tool for early screening of DM.
Funder
National Natural Science Foundation of China
Publisher
Springer Science and Business Media LLC
Subject
Health Informatics,Health Policy,Computer Science Applications
Reference55 articles.
1. Herman WH. The Global burden of diabetes: an overview. Berlin: Springer; 2017. p. 1–5.
2. Zhang M, Zhou J, Liu Y, Sun X, Luo X, Han C, Zhang L, Wang B, Ren Y, Zhao Y. Risk of type 2 diabetes mellitus associated with plasma lipid levels: The Rural Chinese Cohort Study. Diabetes Res Clin Pract. 2017;135:150.
3. Carracher AM, Marathe PH, Close KL. International Diabetes Federation 2017. J Diabetes. 2018;10(5):353–6.
4. Gu W, Ren Y, Ji L. Non-linear associations of risk factors with mild hypoglycemia among Chinese patients with type 2 diabetes. J Diabetes Complications. 2016;30(3):462–8.
5. Guidelines for the prevention and control of type 2 diabetes in China (2017 Edition). Chin J Pract Internal Med 2018; 38(4):292–344.
Cited by
36 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献