Abstract
Background
Gastric cancer with liver metastasis (GCLM) patients typically have a grim prognosis and are at high risk of early mortality. This study aimed to predict cancer-specific early mortality and risk factors for GCLM patients through machine learning (ML) methods.
Methods
The data of patients with GCLM were obtained from the SEER database. LASSO regression, univariate and multivariate logistic regression analyses were employed to identify significant independent risk factors for cancer-specific early death (CSED). Models such as logistic regression (LR), decision tree (DT), K-nearest neighbors (KNN), light gradient boosting machine (LightGBM), random forest (RF), support vector machine (SVM), and extreme gradient boosting (XGBoost) were used to predict the CSED and extract important features. Tenfold cross-validation, receiver operating characteristic (ROC) curve analysis, accuracy, balance accuracy, precision, sensitivity, specificity, F1-score, precision‒recall (PR) curve analysis, calibration curve analysis and decision curve analysis (DCA) were utilized to assess the performance of the models. The DALEX package was used to compute feature importance.
Results
The study recruited a total of 3661 patients. A total of 1648 (45%) patients experienced CSED. Among the 7 ML models, the XGBoost model achieved the best performance. The top 6 most influential factors were chemotherapy, months from diagnosis to therapy, age, grade, N stage, and surgery in the XGBoost model, with chemotherapy being the most significant.
Conclusion
The XGBoost model might be applied to predict the CSED of GCLM patients, and chemotherapy was the most important feature in the XGBoost model. These results could offer crucial reference data to assist clinicians in making informed decisions beforehand.