Author:
Sharifi Mahyar,Khatibi Toktam,Emamian Mohammad Hassan,Sadat Somayeh,Hashemi Hassan,Fotouhi Akbar
Abstract
Abstract
Objectives
To develop and to propose a machine learning model for predicting glaucoma and identifying its risk factors.
Method
Data analysis pipeline is designed for this study based on Cross-Industry Standard Process for Data Mining (CRISP-DM) methodology. The main steps of the pipeline include data sampling, preprocessing, classification and evaluation and validation. Data sampling for providing the training dataset was performed with balanced sampling based on over-sampling and under-sampling methods. Data preprocessing steps were missing value imputation and normalization. For classification step, several machine learning models were designed for predicting glaucoma including Decision Trees (DTs), K-Nearest Neighbors (K-NN), Support Vector Machines (SVM), Random Forests (RFs), Extra Trees (ETs) and Bagging Ensemble methods. Moreover, in the classification step, a novel stacking ensemble model is designed and proposed using the superior classifiers.
Results
The data were from Shahroud Eye Cohort Study including demographic and ophthalmology data for 5190 participants aged 40-64 living in Shahroud, northeast Iran. The main variables considered in this dataset were 67 demographics, ophthalmologic, optometric, perimetry, and biometry features for 4561 people, including 4474 non-glaucoma participants and 87 glaucoma patients. Experimental results show that DTs and RFs trained based on under-sampling of the training dataset have superior performance for predicting glaucoma than the compared single classifiers and bagging ensemble methods with the average accuracy of 87.61 and 88.87, the sensitivity of 73.80 and 72.35, specificity of 87.88 and 89.10 and area under the curve (AUC) of 91.04 and 94.53, respectively. The proposed stacking ensemble has an average accuracy of 83.56, a sensitivity of 82.21, a specificity of 81.32, and an AUC of 88.54.
Conclusions
In this study, a machine learning model is proposed and developed to predict glaucoma disease among persons aged 40-64. Top predictors in this study considered features for discriminating and predicting non-glaucoma persons from glaucoma patients include the number of the visual field detect on perimetry, vertical cup to disk ratio, white to white diameter, systolic blood pressure, pupil barycenter on Y coordinate, age, and axial length.
Publisher
Springer Science and Business Media LLC
Subject
Computational Mathematics,Computational Theory and Mathematics,Computer Science Applications,Genetics,Molecular Biology,Biochemistry
Reference41 articles.
1. Adelson, J.D., et al., Causes of blindness and vision impairment in 2020 and trends over 30 years, and prevalence of avoidable blindness in relation to VISION 2020: the Right to Sight: an analysis for the Global Burden of Disease Study. The Lancet Global Health, 2020.
2. Flaxman SR, et al. Global causes of blindness and distance vision impairment 1990–2020: a systematic review and meta-analysis. The Lancet Global Health. 2017;5(12):e1221–34.
3. Tham Y-C, et al. Global prevalence of glaucoma and projections of glaucoma burden through 2040: a systematic review and meta-analysis. Ophthalmology. 2014;121(11):2081–90.
4. McMonnies CW. Glaucoma history and risk factors. Journal of optometry. 2017;10(2):71–8.
5. Hashemi H, et al. Prevalence and risk factors of glaucoma in an adult population from Shahroud, Iran. Journal of Current Ophthalmology. 2019;31(4):366–72.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献