Author:
Eissa Noureldin S.,Khairuddin Uswah,Yusof Rubiyah
Abstract
Abstract
Background
DNA Methylation is one of the most important epigenetic processes that are crucial to regulating the functioning of the human genome without altering the DNA sequence. DNA Methylation data for cancer patients are becoming more accessible than ever, which is attributed to newer DNA sequencing technologies, notably, the relatively low-cost DNA microarray technology by Illumina Infinium. This technology makes it possible to study DNA methylation at hundreds of thousands of different loci. Currently, most of the research found in the literature focuses on the discovery of DNA methylation markers for specific cancer types. A relatively small number of studies have attempted to find unified DNA methylation biomarkers that can diagnose different types of cancer (pan-cancer classification).
Results
In this study, the aim is to conduct a pan-classification of cancer disease. We retrieved individual data for different types of cancer patients from The Cancer Genome Atlas (TCGA) portal. We selected data for many cancer types: Breast Cancer (BRCA), Ovary Cancer (OV), Stomach Cancer (STOMACH), Colon Cancer (COAD), Kidney Cancer (KIRC), Liver Cancer (LIHC), Lung Cancer (LUSC), Prostate Cancer (PRAD) and Thyroid cancer (THCA). The data was pre-processed and later used to build the required dataset. The system that we developed consists of two main stages. The purpose of the first stage is to perform feature selection and, therefore, decrease the dimensionality of the DNA methylation loci (features). This is accomplished using an unsupervised metaheuristic technique. As for the second stage, we used supervised machine learning and developed deep neural network (DNN) models to help classify the samples’ malignancy status and cancer type. Experimental results showed that compared to recently published methods, our proposed system achieved better classification results in terms of recall, and similar and higher results in terms of precision and accuracy. The proposed system also achieved an excellent receiver operating characteristic area under the curve (ROC AUC) values varying from 0.85 to 0.89.
Conclusions
This research presented an effective new approach to classify different cancer types based on DNA methylation data retrieved from TCGA. The performance of the proposed system was compared to recently published works, using different performance metrics. It provided better results, confirming the effectiveness of the proposed method for classifying different cancer types based on DNA methylation data.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference34 articles.
1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global cancer statistics 2020: globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2021;71(3):209–49.
2. Homrich GK, Andrade CF, Marchiori RC, Lidtke GDS, Martins FP, Santos JWAD. Prevalence of benign diseases mimicking lung cancer: experience from a university hospital of southern brazil. Tuberc Respir Dis. 2015;78(2):72–7.
3. Rath T, Atreya R, Geißdörfer W, Lang R, Nägel A, Neurath MF. A severe case of tuberculosis radiologically and endoscopically mimicking colorectal cancer with peritoneal carcinomatosis. Case Rep Gastroenterol. 2017. https://doi.org/10.1155/2017/6206951.
4. Watte G, Tonietto RG, Severo CB, Bello AG, de Mattos Oliveira F, Hochhegger B, Irion K, da Silva Moreira J, Severo LC. Infection mimicking cancer: retrospective analysis of 147 cases, emphasizing fungal etiology. Eur Respir J. 2014;44(58):2512.
5. Locke WJ, Guanzon D, Ma C, Liew YJ, Duesing KR, Fung K, Ross JP. DNA methylation cancer biomarkers: translation to the clinic. Front Genet. 2019;10:1150.
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献