Affiliation:
1. Accra Technical University, Accra, Ghana
2. Vita Verde Consult, Accra, Ghana
Abstract
Breast cancer is the most common of all cancers and is the leading cause of cancer deaths in women worldwide. The classification of breast cancer data can be useful to predict the outcome of some diseases or discover the genetic behavior of tumors. Data mining technology helps in classifying cancer patients and this technique helps to identify potential cancer patients by simply analyzing the data. This study examines the determinant factors of breast cancer and measures the breast cancer patient data to build a useful classification model using a data mining approach. In this study of 2397 women, 1022 (42.64%) were diagnosed with breast cancer. Among the four main learning techniques such as: Random Forest, Naive Bayes, Classification and Regression Model (CART), and Boosted Tree model were used for the study. The Random Forest technique had the better accuracy value of 0.9892(95%CI,0.9832 -0.9935) and a sensitivity value of about 92%. This means that the Random Forest learning model is the best model to classify and predict breast cancer based on associated factors.
Reference21 articles.
1. Using data mining for assessing diagnosis of breast cancer
2. Using Machine Learning Algorithms for Breast Cancer Risk Prediction and Diagnosis
3. BaceR. G. (2000). Intrusion detection. Sam’s Publishing.
4. Predicting breast cancer survivability using data mining techniques.;A.Bellaachia;Age,2006
5. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.;F.Bray;CA: a Cancer Journal for Clinicians,2018