Abstract
Background
Accurately predicting the survival rate of breast cancer patients is a major issue for cancer researchers. Machine learning (ML) has attracted much attention with the hope that it could provide accurate results, but its modeling methods and prediction performance remain controversial. The aim of this systematic review is to identify and critically appraise current studies regarding the application of ML in predicting the 5-year survival rate of breast cancer.
Methods
In accordance with the PRISMA guidelines, two researchers independently searched the PubMed (including MEDLINE), Embase, and Web of Science Core databases from inception to November 30, 2020. The search terms included breast neoplasms, survival, machine learning, and specific algorithm names. The included studies related to the use of ML to build a breast cancer survival prediction model and model performance that can be measured with the value of said verification results. The excluded studies in which the modeling process were not explained clearly and had incomplete information. The extracted information included literature information, database information, data preparation and modeling process information, model construction and performance evaluation information, and candidate predictor information.
Results
Thirty-one studies that met the inclusion criteria were included, most of which were published after 2013. The most frequently used ML methods were decision trees (19 studies, 61.3%), artificial neural networks (18 studies, 58.1%), support vector machines (16 studies, 51.6%), and ensemble learning (10 studies, 32.3%). The median sample size was 37256 (range 200 to 659820) patients, and the median predictor was 16 (range 3 to 625). The accuracy of 29 studies ranged from 0.510 to 0.971. The sensitivity of 25 studies ranged from 0.037 to 1. The specificity of 24 studies ranged from 0.008 to 0.993. The AUC of 20 studies ranged from 0.500 to 0.972. The precision of 6 studies ranged from 0.549 to 1. All of the models were internally validated, and only one was externally validated.
Conclusions
Overall, compared with traditional statistical methods, the performance of ML models does not necessarily show any improvement, and this area of research still faces limitations related to a lack of data preprocessing steps, the excessive differences of sample feature selection, and issues related to validation. Further optimization of the performance of the proposed model is also needed in the future, which requires more standardization and subsequent validation.
Funder
The Bethune Project of Jilin University
Health and Health Science and Technology Innovation Self-funded Project of Jilin Province
Publisher
Public Library of Science (PLoS)
Reference83 articles.
1. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.;F Bray;CA: a cancer journal for clinicians.,2018
2. Predicting breast cancer survivability: a comparison of three data mining methods;D Delen;Artificial intelligence in medicine,2005
3. Heterogeneity in breast cancer;K Polyak;The Journal of clinical investigation,2011
4. Prognostic models: a methodological framework and review of models for breast cancer.;Altman;Cancer Investigation,2009
5. Do we really need prognostic factors for breast cancer?;GM Clark;Breast cancer research and treatment,1994
Cited by
73 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献