Abstract
AbstractBackgroundBreast cancer is the foremost cancer in worldwide incidence, surpassing lung cancer notwithstanding the gender bias. One in four cancer cases among women are attributable to cancers of the breast, which are also the leading cause of death in women. Reliable options for early detection of breast cancer are needed.MethodsUsing public-domain datasets, we screened transcriptomic profiles of breast cancer samples, and identified progression-significant linear and ordinal model genes using stage-informed models. We then applied a sequence of machine learning techniques, namely feature selection, principal components analysis, and k-means clustering, to train a learner to discriminate ‘cancer’ from ‘normal’ based on expression levels of identified biomarkers.ResultsOur computational pipeline yielded an optimal set of nine biomarker features for training the learner, namely NEK2, PKMYT1, MMP11, CPA1, COL10A1, HSD17B13, CA4, MYOC, and LYVE1. Validation of the learned model on an internal testset yielded a performance of 99.5% accuracy. Blind validation on an external dataset yielded a balanced accuracy of 95.5%, demonstrating that the model has effectively reduced the dimensionality of the problem, and learnt the solution. The model was rebuilt using the full dataset, and then deployed as a web app for non-profit purposes at:https://apalania.shinyapps.io/brcadx/. To our knowledge, this is the best-performing freely available tool for the high-confidence diagnosis of breast cancer, and represents a promising aid to medical diagnosis.
Publisher
Cold Spring Harbor Laboratory
Reference48 articles.
1. Overview of Breast Cancer and Implications of Overtreatment of Early-Stage Breast Cancer: An Indian Perspective
2. L1000CDS2: LINCS L1000 characteristic direction signatures search engine
3. Fakoor, R. , Ladhak, F. , Nazi, A. & Huber, M. Using deep learning to enhance cancer diagnosis and classification. Proceedings of the 30th International Conference on Machine Learning, Atlanta, Georgia, USA. JMLR:W&CP 28 (2013).