Affiliation:
1. Universitas Muhammadiyah Berau
Abstract
Abstract
Background
Diabetes and cardiovascular disease are two of the main causes of death in the United States. Identifying and predicting these diseases in patients is the first step towards stopping their progression. We evaluate the capabilities of machine learning models in detecting at-risk patients using survey data (and laboratory results), and identify key variables within the data contributing to these diseases among the patients.
Methods
Our research explores data-driven approaches which utilize supervised machine learning models to identify patients with such diseases. Using the National Health and Nutrition Examination Survey (NHANES) dataset, we conduct an exhaustive search of all available feature variables within the data to develop models for cardiovascular, prediabetes, and diabetes detection. Using different time-frames and feature sets for the data (based on laboratory data), multiple machine learning models (Support vector machines and adaptive boosting) were evaluated on their classification performance. The models were then combined to develop a weighted ensemble model, capable of leveraging the performance of the disparate models to improve detection accuracy. Information gain of tree-based models was used to identify the key variables within the patient data that contributed to the detection of at-risk patients in each of the diseases classes by the data-learned models.
Results
Diabetes and cardiovascular disease (CVD) are two of the leading causes of death in the United States. Detecting and predicting these diseases in patients is the first step to halting their progression. In this study, it was used Adaptive Boosting (AdaBoost) and Support Vector Machines (SVM) together as prediction. The purpose of this study was to knowing whether AdaBoost SVM could produce good accuracy. Tests were conducted using 50% data training and 50% data testing. Dot kernel were used to SVM. The highest accuracy value of AdaBoost SVM was accuracy 98.54%. Therefore it could be that AdaBoost can improve the performance of SVM in prediction of CVD desease severity
Conclusion
We conclude machine learned models based on survey questionnaire can provide an automated identification mechanism for patients at risk of diabetes and cardiovascular diseases. We also identify key contributors to the prediction, which can be further explored for their implications on electronic health records.
Publisher
Research Square Platform LLC
Reference48 articles.
1. Centers for disease control and prevention, “National Diabetes Statistics Report,” https://www.cdc.gov/diabetes/data/statistics-report/index.html.
2. A. Adler, “Using Machine Learning Techniques to Identify Key Risk Factors for Diabetes and Undiagnosed Diabetes,” May 2021, [Online]. Available: http://arxiv.org/abs/2105.09379
3. M. Niaz Imtiaz and A. Haque, “Predicting Type-2 Diabetes Using Machine Learning and Feature Selection Techniques.”
4. A. S. Abdalrada, J. Abawajy, T. Al-Quraishi, and S. M. S. Islam, “Machine learning models for prediction of co-occurrence of diabetes and cardiovascular diseases: a retrospective cohort study,” J Diabetes Metab Disord, vol. 21, no. 1, pp. 251–261, Jun. 2022, doi: 10.1007/s40200-021-00968-z.
5. A. Javaid et al., “Medicine 2032: The future of cardiovascular disease prevention with machine learning and digital health technology,” American Journal of Preventive Cardiology, vol. 12. Elsevier B.V., Dec. 01, 2022. doi: 10.1016/j.ajpc.2022.100379.