Affiliation:
1. University of Valladolid
2. University Hospital of Valladolid
Abstract
Abstract
Breast cancer is a significant health problem, with about 2 million new cases annually diagnosed and 600,000 deaths. Early detection and accurate diagnosis are critical to patient prognosis. Machine learning (ML) models show promising results in accurate and efficient diagnosis. In the present work, the performance of different models of ML are studied in the publicly accessible online dataset "Wisconsin Breast Cancer Dataset". Those models are formed by logistic regressions, Random Forest, Naïve Bayes, and Support Vector Machine algorithms, being the last one the best performing. An ensemble model combining the best proposed models is then implemented. An SVM model with standardized dataset is used, a logistic regression model with standardized dataset and 10-component PCA analysis. A Random Forest model with standardized dataset and 60 estimators. All models use a test dataset formed by 30% of the original dataset. The models are combined using a majority weighted voting system. The SVM model has a weight of 0.5 while the regression and Random Forest models have weights of 0.25. The ensemble voting model manages to improve the results of the individual models with an accuracy of 98%, precision of 97%, recall of 99% and F1 score of 98%.
Publisher
Research Square Platform LLC
Reference28 articles.
1. 1. B. S. Chhikara and K. Parang, “Global Cancer Statistics 2022: the trends projection analysis,” Chemical Biology Letters, vol. 10, no. 1, p. 451, 2023.
2. 2. “Breast Cancer Statistics, American Cancer Society.” American Cancer Society. [Online]. Available: https://www.cancer.org/cancer/breast-cancer/about/how-common-is-breast-cancer.html
3. 3. T. B. Bevers et al., “Breast cancer screening and diagnosis,” Journal of the National Comprehensive Cancer Network, vol. 7, no. 10, pp. 1060–1096, 2009.
4. 4. G. D. Magoulas and A. Prentza, “Machine learning in medical applications,” Machine Learning and Its Applications: advanced lectures, pp. 300–307, 2001.
5. 5. K. Kourou, T. P. Exarchos, K. P. Exarchos, M. V Karamouzis, and D. I. Fotiadis, “Machine learning applications in cancer prognosis and prediction,” Comput Struct Biotechnol J, vol. 13, pp. 8–17, 2015.