Abstract
Purpose
The purpose of this study is to construct and select an optimal risk prediction model for papillary thyroid microcarcinoma (PTMC), so as to judge whether surgery is needed according to the actual situation and reduce the risk of excessive medical treatment.
Methods
This study included 17,768 patients with PTMC collected from SEER database were enrolled in this study. All participants were randomly assigned in a 6:2:2, training set (n = 10,660), test set (n = 3,554), and verification set (n = 3,554). Five ML models (random forest (RF), XGBoost, Lightgbm, Logistic regression (LR), and KNN) were constructed by Python 3.8.0, and the optimal model parameters were obtained through 10-fold cross validation and grid optimization tuning. Receiver operating characteristic curve (ROC), area under the receiver operating characteristic curve (AUC), sensitivity, accuracy, precision, specificity, and Brier score were used to compare the predictive ability of five models.
Results
Of these patients, most of the patients are < 55 years (70.2%). XGBoost model is the optimal among the five models, which has average AUC of 0.7883, followed by the LR model with AUC is 0.7880. Interestingly, the XGBoost model also achieves the highest score of sensitivity, accuracy, precision, and specificity were 0.7991, 0.8796, 0.8036, and 0.8036 separately.
Conclusion
XGBoost can be used as optimal model to identify the risk of PTMC. This finding will provide special insights into the risk assessment of patients with PTMC and avoid overtreatment.