Affiliation:
1. Universidade de São Paulo Faculdade de Medicina de Ribeirão Preto: Universidade de Sao Paulo Faculdade de Medicina de Ribeirao Preto
2. University of Sao Paulo Campus of Ribeirao Preto: Universidade de Sao Paulo - Campus de Ribeirao Preto
3. Universidade de Sao Paulo Faculdade de Medicina de Ribeirao Preto
4. Fundação Pio XII: Hospital de Cancer de Barretos
5. Universidade de Sao Paulo Campus de Sao Paulo: Universidade de Sao Paulo
6. Universidade Federal de Santa Catarina
Abstract
Abstract
Purpose
To establish a reliable machine learning model to predict malignancy in breast lesions identified by ultrasound and optimize the negative predictive value to minimize unnecessary biopsies.
Methods
We included clinical and ultrasonographic attributes from 1526 breast lesions classified as BI-RADS 3, 4a, 4b, 4c, 5 and 6 that underwent ultrasound guided breast biopsy in four institutions. We selected the most informative attributes to train nine machine learning models, ensemble models and models with tuned threshold to make inferences about the diagnosis of BI-RADS 4a and 4b lesions (validation dataset). We tested the performance of the final model with 403 new suspicious lesions.
Results
The most informative attributes were shape, margin, orientation and size of the lesions, the resistance index of the internal vessel, the age of the patient and the presence of a palpable lump. The highest mean NPV was achieved with KNN (97.9%). Making ensembles didn´t improve the performance. Tuning the threshold did improve the performance of the models and we chose the XGBoost with the tuned threshold as the final one. The tested performance of the final model was: NPV 98.1%, FN 1.9%, VPP 77.1%, FP 22.9%. Applying this final model, we would have missed 2 of the 231 malignant lesions of the test dataset (0.8%).
Conclusion
Machine learning can help physicians predict malignancy in suspicious breast lesions identified by the US. Our final model would be able to avoid 60.4% of the biopsies in benign lesions missing less than 1% of the cancer cases.
Publisher
Research Square Platform LLC
Reference34 articles.
1. Epidemiology of Breast Cancer;Ban K;Surg Oncol Clin N Am 1o de julho de,2014
2. Global cancer control: responding to the growing burden, rising costs and inequalities in access;Prager GW;ESMO Open 1o de janeiro de,2018
3. Hubbard A, Kerlikowske R, Flowers KI, Yankaskas CC, Zhu B, Miglioretti WL D. Cumulative Probability of False-Positive Recall or Biopsy Recommendation After 10 Years of Screening Mammography. Ann Intern Med [Internet]. 18 de outubro de 2011 [citado 11 de dezembro de 2022]; Disponível em: https://www.acpjournals.org/doi/10.7326/0003-4819-155-8-201110180-00004
4. Cumulative Probability of False-Positive Results After 10 Years of Screening With Digital Breast Tomosynthesis vs Digital Mammography;Ho TQH;JAMA Netw Open 25 de março de,2022
5. Combined Screening With Ultrasound and Mammography vs Mammography Alone in Women at Elevated Risk of Breast Cancer;Berg WA;JAMA 14 de maio de,2008