A review of the machine learning datasets in mammography, their adherence to the FAIR principles and the outlook for the future-Reference-Cited by-同舟云学术

A review of the machine learning datasets in mammography, their adherence to the FAIR principles and the outlook for the future

Published:2023-09-08 Issue:1 Volume:10 Page:
ISSN:2052-4463
Container-title:Scientific Data
language:en
Short-container-title:Sci Data

Author:

Logan Joe,Kennedy Paul J.^ORCID,Catchpoole Daniel

Abstract

AbstractThe increasing rates of breast cancer, particularly in emerging economies, have led to interest in scalable deep learning-based solutions that improve the accuracy and cost-effectiveness of mammographic screening. However, such tools require large volumes of high-quality training data, which can be challenging to obtain. This paper combines the experience of an AI startup with an analysis of the FAIR principles of the eight available datasets. It demonstrates that the datasets vary considerably, particularly in their interoperability, as each dataset is skewed towards a particular clinical use-case. Additionally, the mix of digital captures and scanned film compounds the problem of variability, along with differences in licensing terms, ease of access, labelling reliability, and file formats. Improving interoperability through adherence to standards such as the BIRADS criteria for labelling and annotation, and a consistent file format, could markedly improve access and use of larger amounts of standardized data. This, in turn, could be increased further by GAN-based synthetic data generation, paving the way towards better health outcomes for breast cancer.

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability

Link

https://www.nature.com/articles/s41597-023-02430-6.pdf

Reference43 articles.

1. Carney, P. A. et al. Individual and combined effects of age, breast density, and hormone replacement therapy use on the accuracy of screening mammography. Annals of internal medicine 138, 168–175 (2003).

2. Lancet, T. Breast cancer in developing countries. The Lancet Oncology 374, 1077–1085 (2009).

3. Wilkinson, M. D. et al. The fair guiding principles for scientific data management and stewardship. Scientific data 3, 1–9 (2016).

4. Bishop, B. W., Hank, C. & Webster, J. The Data Life Aquatic. International Journal of Digital Curation 16, 10 (2022).

5. Heath, M., Bowyer, K., Kopans, D., Moore, R. & Kegelmeyer, P. The digital database for screening mammography. In Proceedings of the Fifth International Workshop on Digital Mammography, 212–218.

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Data Domain Adaptation in Federated Learning in the Breast Mammography Image Classification Problem;2024 16th International Conference on Human System Interaction (HSI);2024-07-08

2. Systematic Review of Challenges in Implementing Artificial Intelligence in Breast Cancer Screening Programs: Towards a Framework for Safe Adoption (Preprint);2024-06-05

3. Machine learning data practices through a data curation lens: An evaluation framework;The 2024 ACM Conference on Fairness, Accountability, and Transparency;2024-06-03

4. Deep Learning in Breast Cancer Imaging: State of the Art and Recent Advancements in Early 2024;Diagnostics;2024-04-19

5. Comparative Study of Artificial Intelligence Models for Breast Cancer Detection;Journal of Trends in Computer Science and Smart Technology;2024-03