BACKGROUND
Breast cancer incidence may be higher among patients with type 2 diabetes mellitus (T2DM) compared with the general population. This study evaluated the performance of three models for predicting breast cancer risk in patients with T2DM.
OBJECTIVE
This study evaluated the performance of three models for predicting breast cancer risk in patients with T2DM.
METHODS
In total, 1,267,867 patients with newly diagnosed T2DM between 2000 and 2012 were identified from Taiwan National Health Insurance Research Database. By employing their data, we created prediction models for detecting an increased risk of subsequent breast cancer development in T2DM patients. The available potential risk factors for breast cancer were also collected for adjustment in the analyses. The Synthetic Minority Oversampling Technique (SMOTE) was used to augment data points in the minority class. Each data point was randomly allocated to the training and test sets at a ratio of approximate 39:1. The performance of artificial neural network (ANN), logistic regression (LR), and random forest (RF) models were determined using the recall, precision, F1 score, and area under receiver operating characteristic curve (AUC).
RESULTS
The AUCs of all three models were significantly higher than the area of 0.5 for the null hypothesis (0.959, 0.865, and 0.834 for RF, ANN, and LR models, respectively). The RF model has the largest AUC among all models; moreover, it had the highest values in all other metrics.
CONCLUSIONS
Although all three models could accurately predict high breast cancer risk in patients with T2DM in Taiwan, the RF model demonstrated the best performance.
CLINICALTRIAL
This is not a chinical trial.