Affiliation:
1. Loyola College
2. ICMR - National Institute for Research in Tuberculosis
Abstract
Abstract
BACKGROUND: Breast cancer is the commonest type of cancer in women worldwide and the leading cause of mortality for females. Despite the fact that many breast cancer patients have no family members who have also had the disease. Women who have it are more at risk than those who don't.
OBJECTIVE: The aim of this research is to classify the death status of breast cancer patients using the Surveillance, Epidemiology, and End Results (SEER) dataset. Due to its capacity to handle enormous data sets systematically, machine learning has been widely employed in biomedical research to answer diverse classification difficulties. Pre-processing data enables its visualization and analysis for use in making important decisions.
METHODOLOGY: This research presents a feasible machine learning-based approach for categorizing datasets related to breast cancer. Moreover, a two-step feature selection method based on Variance Threshold and Principal Component Analysis (PCA) was employed to select the features from the SEER breast cancer dataset. After selecting the features, the classification of the breast cancer dataset is carried out using Supervised and Ensemble learning techniques such as Ada Boosting (AB), XG Boosting (XGB), and Gradient Boosting (GB), as well as binary classification techniques such as Naive Bayes (NB) and Decision Tree (DT).
RESULTS:In this study, it is observed that the Decision Tree algorithm showed better results than other algorithms used in this analysis (AB, XGB, GB & NB). The accuracy of DT for both train-test split and cross validation achieved as 98%.
CONCLUSION: Utilizing the train-test split and k-fold cross-validation approaches, the performance of various machine learning algorithms is examined. The Decision Tree algorithm outperforms other supervised and ensemble learning approaches, according to the experimental data.
Publisher
Research Square Platform LLC
Reference29 articles.
1. A dynamic ensemble learning algorithm for neural networks;Alam KMR;Neural Comput. Appl.,2019
2. Current and future burden of breast cancer: Global statistics for 2020 and 2040;Arnold M;Breast,2022
3. Bazazeh D and R. Shubair, "Comparative study of machine learning algorithms for breast cancer detection and diagnosis," 2016 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA), 2016, pp. 1–4, doi: 10.1109/ICEDSA.2016.7818560.
4. Artificial intelligence in cancer imaging: Clinical challenges and applications;Bi WL;CA Cancer J. Clin,2019
5. Cha C, Jeong J, Kim HK, Nam SJ, Seong MK, Woo J, Park WC, Ryu S, Chung MS; Korean Breast Cancer Society. Survival benefit from axillary surgery in patients aged 70 years or older with clinically node-negative breast cancer: A population-based propensity-score matched analysis. Eur J Surg Oncol. 2022 Jul 16:S0748-7983(22)00547-9.