BACKGROUND
Periodontitis is a multifactorial disease that involves numerous risk factors and indicators. While a few factors, such as age, uncontrolled diabetes, and smoking, have been well established as true risk factors for periodontitis, many risk factors and indicators remain debatable due to conflicting data from previous studies. This calls for a novel approach to data analysis to improve the accuracy of risk assessments and disease predictions.
OBJECTIVE
This study aimed to assess the ability of machine learning approaches to identify important risk indicators for periodontitis using data from the 2015–2018 Korea National Health and Nutrition Examination Survey of 13,946 subjects.
METHODS
The severity of periodontitis was categorized as non-severe and severe according to the community periodontal index. Machine learning models such as classification and regression tree, gradient boosting machine, random forest, extareme gradient boost (XGBoost), and multilayer perceptron (MLP) were developed, and their performance based on the area under the receiver operating characteristic curve (AUC) was compared with that of the conventional logistic regression analysis.
RESULTS
XGBoost and MLP showed higher performance than the logistic regression model. The important risk indicators for periodontitis were age, sex, education, smoking, blood pressure, use of interdental cleaning aids, and glycated hemoglobin. Interestingly, further analysis showed that glycated hemoglobin and frequency of binge drinking were significant indicators of severe periodontitis. While the findings of this study showed moderate AUC in machine learning models, it confirmed consistent risk indicator rank, which was derived from feature importance analysis such as Gini impurity, permutation importance, and Shapley additive explanations.
CONCLUSIONS
Collectively, our findings suggest the importance of considering a novel data analysis methodology, such as machine learning, for the better management of periodontitis.