Abstract
AbstractBreast cancer (BC) is known as the most prevalent form of cancer among women. Recent research has demonstrated the potential of Machine Learning (ML) techniques in predicting the five-year BC risk using personal health data. Support Vector Machine (SVM), Random Forest, K-NN (K-Nearest Neighbour), Naive Bayes, Neural Network, Decision Tree (DT), Logistic Regression (LR), Discriminant Analysis, and their variants are commonly employed in ML for BC analysis. This study investigates the factors influencing the performance of ML techniques in the domain of BC prevention, with a focus on dataset size and feature selection. The study's goal is to examine the effect of dataset cardinality, feature selection, and model selection on analytical performance in terms of Accuracy and Area Under the Curve (AUC). To this aim, 3917 papers were automatically selected from Scopus and PubMed, considering all publications from the previous 5 years, and, after inclusion and exclusion criteria, 54 articles were selected for the analysis. Our findings highlight how a good cardinality of the dataset and effective feature selection have a higher impact on the model's performance than the selected model, as corroborated by one of the studies, which gets extremely good results with all of the models employed.
Funder
Consiglio Nazionale Delle Ricerche
Publisher
Springer Science and Business Media LLC
Reference95 articles.
1. European Commission (2024) Horizon Europe [Internet]. European Commission. Available from: https://ec.europa.eu/info/research-and-innovation/funding/funding-opportunities/fundingprogrammes-and-open-calls/horizon-europe_en
2. Alqahtani B, Alnajrani B, Alhaidari F (2021) Machine learning for predicting cancer disease: comparative analysis. In: Enabling machine learning applications in data science. Springer, pp 237–248
3. Mathappan N, Soundariya R, Natarajan A, Gopalan SK (2020) Biomedical analysis of breast cancer risk detection based on deep neural network. Int J Med Eng Inf 12(6):529–541
4. Dafni U, Tsourti Z, Alatsathianos I (2019) Breast cancer statistics in the European union: incidence and survival across European countries. Breast Care 14(6):344–353
5. Kalafi E, Nor N, Taib N, Ganggayah M, Town C, Dhillon S (2019) Machine learning and deep learning approaches in breast cancer survival prediction using clinical data. Folia Biol 65(5/6):212–220
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献