Abstract
Objectives: The objective of this study was to utilize random forest methodology to develop a practical diagnostic function for predicting lymph node metastasis in patients diagnosed with breast cancer. Methods: The research data of this retrospective cohort study was obtained through a comprehensive analysis of telephone interviews and medical records of 241 patients with breast cancer referred to the hospitals affiliated with Mazandaran University of Medical Sciences between 2016 and 2022. The data analysis method used in this study was random forest analysis to identify the influential factors associated with lymph node metastasis using R software. Results: The mean age of diagnosis for patients was 52.03 ± 10.932. Based on the random forest analysis outcomes, an accuracy rate of 72.2% has been attained. The influential factors in our study included grade, tubule formation, skin involvement, p53 marker, margin involvement, nuclear pleomorphism, Ki67, tumor location, estrogen receptor (ER), and (progesterone receptor) PR markers. These factors were determined to have a significant impact based on the mean accuracy reduction index. Furthermore, the variables that demonstrated significance based on the mean Gini reduction index included age, grade, tubule formation, tumor size, nuclear pleomorphism, disease level, mitosis, skin involvement, tumor location, and margin involvement. Conclusions: The utilization of the random forest algorithm, which demonstrates a favorable level of discriminative capability, may serve as a suitable approach for predicting metastasis in patients with breast cancer. Furthermore, by identifying these factors, experts can employ effective strategies to mitigate the condition.