Affiliation:
1. Guangzhou Medical University
2. Guangdong Province Women and Children Hospital
Abstract
Abstract
Background As the accuracy of predictive models for vulval cancer patients is limited, this study aims to construct and compare the risk of lymph node metastasis of vulval cancer based on machine learning (ML) algorithms using the Surveillance, Epidemiology, and End Results public database of the National Cancer Institute .Methods Data from the SEER database were extracted for registrations between 2010 and 2015 and randomly divided into a training set and a validation set (7:3). Six machine learning (ML) technologies were used to develop predictive models for distant metastasis, including multi-layer perception models (MLP), support vector machines (SVM), naïve Bayes (NBC), decision trees (DT), random forests (RF), and k-nearest neighbors (KNN). Evaluation and comparison of different predictive models were performed using receiver operating characteristic (ROC) curves (AUC-ROC) and decision curve analysis (DCA).Results A total of 6,813 patients were involved and randomly divided into a training set (N = 4,768) and a validation set (N = 2,045). Based on the Boruta algorithm, 11 important factors were identified. In the training set, the RandomForest model performed best (AUC = 0.820), significantly better than the other five models. In the validation set, the RandomForest model also demonstrated better predictive ability than the other models (AUC = 0.799), according to DCA results. Feature importance analysis showed that the recursive feature elimination (RFE) algorithm was used to select key variables in the RandomForest model, and finally five important factors were determined, among which the T stage of the tumor was the most important variable.Conclusion The RandomForest model was proven to be an effective algorithm with better predictive ability. This model is intended to support future decisions regarding the risk of lymph node metastasis in vulval cancer
Publisher
Research Square Platform LLC