Affiliation:
1. Monterrey Institute of Technology and Higher Education
2. Hospital Infantil de México Federico Gómez
3. Brigham and Women's Hospital
Abstract
Abstract
Background:
Breast cancer is the second leading cause of global female mortality. Diagnosing and treating breast cancer patients at early stages is relevant for providing successful treatment and increasing the patient's survival rate. The use of new analytical methods for massive data from biological samples, such as Machine Learning Algortithms (MLAs), is necessary for improving cancer diagnosis, especially in patients from low-income countries. A computational methodology for selecting a small number of biomarkers with strong diagnostic capabilities and an accessible cellular location could be useful for developing low-cost diagnostic devices. Hence, this study aimed to develop a computational methodology to find relevant genetic biomarkers and establish a discrete panel of genes capable of classifying breast cancer samples for diagnostic purposes with high accuracy.
Methods:
This study aimed to develop a computational methodology for finding genetic biomarkers and establish a panel with a few genes capable of classifying breast cancer molecularly for diagnostic purposes. Panels with a small number of genes (<10) that can be used for the molecular classification of breast cancer cells through four Machine Learning Algorithms on transcriptomic data. Five gene selection approaches were used for the generation of these panels: factor analysis genes, surfaceome genes, transmembrane genes, combined genes, and network analysis genes. The classification performance and analyzed and validated using seven factorial designs and non-parametric statistical tests.
Results:
The MLAs accuracy was higher than 80% in cell lines and in patient samples for all selection approaches. The combined approach with the best genes of the three approaches (transmembrane, surfaceome, and factor analysis) had better classification performance than each approach alone. Also, the combined genes of this approach (TMEM210, CD44, SPDEF, TENM4, KIRREL, BCAS1, TMEM86A, LRFN2, TFF3) had similar performance than the ones selected by network analysis. The panel of genes identified from the combined approach was completely different from the genes previously described in four commercial panels for breast cancer that were analyzed.
Conclusions
In this study, the panels of selected genes were capable of classify breast cancer cell lines and patient samples according to their molecular characteristics. Two genes of the combined approach (TFF3 and CD44) have been used in cancer biosensors, which suggests a plausible result due to the potential for the development of new diagnostic devices; however, experimental studies are required to corroborate this type of implementation.
Publisher
Research Square Platform LLC
Reference62 articles.
1. International evaluation of an AI system for breast cancer screening;McKinney SM;Nature,2020
2. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries;Sung H;CA: a cancer journal for clinicians,2021
3. Breast cancer statistics: recent trends;Ahmad A;Breast Cancer Metastasis and Drug Resistance,2019
4. Key steps for effective breast cancer prevention;Britt KL;Nature Reviews Cancer,2020
5. Francies FZ, Hull R, Khanyile R, Dlamini Z. Breast cancer in low-middle income countries: abnormality in splicing and lack of targeted treatment options. 2020;