Hierarchical classification-based pan-cancer methylation analysis to classify primary cancer
-
Published:2023-12-08
Issue:1
Volume:24
Page:
-
ISSN:1471-2105
-
Container-title:BMC Bioinformatics
-
language:en
-
Short-container-title:BMC Bioinformatics
Author:
Yang Youpeng,Zeng Qiuhong,Liu Gaotong,Zheng Shiyao,Luo Tianyang,Guo Yibin,Tang Jia,Huang Yi
Abstract
AbstractHierarchical classification offers a more specific categorization of data and breaks down large classification problems into subproblems, providing improved prediction accuracy and predictive power for undefined categories, while also mitigating the impact of poor-quality data. Despite these advantages, its application in predicting primary cancer is rare. To leverage the similarity of cancers and the specificity of methylation patterns among them, we developed the Cancer Hierarchy Classification Tool (CHCT) using the idea of hierarchical classification, with methylation data from 30 cancer types and 8239 methylome samples downloaded from publicly available databases (The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO)). We used unsupervised clustering to divide the classification subproblems and screened differentially methylated sites using Analysis of variance (ANOVA) test, Tukey-kramer test, and Boruta algorithms to construct models for each classifier module. After validation, CHCT accurately classified 1568 out of 1660 cases in the test set, with an average accuracy of 94.46%. We further curated an independent validation cohort of 677 cancer samples from GEO and assigned a diagnosis using CHCT, which showed high diagnostic potential with generally high accuracies (an average accuracy of 91.40%). Moreover, CHCT demonstrates predictive capability for additional cancer types beyond its original classifier scope as demonstrated in the medulloblastoma and pituitary tumor datasets. In summary, CHCT can hierarchically classify primary cancer by methylation profile, by splitting a large-scale classification of 30 cancer types into ten smaller classification problems. These results indicate that cancer hierarchical classification has the potential to be an accurate and robust cancer classification method.
Funder
Sun Yat-sen University Tongchuang Intelligent Medical interdisciplinary talent training Foundation
Min-Yue Cooperative Research Fund
National Natural Science Foundation of China grants
Guangzhou Key Laboratory of Molecular and Functional Imaging for Clinical Translation
Guangdong Basic and Applied Basic Research Foundation
Medical Scientific Research Foundation of Guangdong Province
Research Foundation of Guangdong Provincial Reproductive Science Institute
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference38 articles.
1. Yizhak K, Aguet F, Kim J, Hess JM, Kübler K, Grimsby J, Frazer R, Zhang H, Haradhvala NJ, Rosebrock D, et al. Rna sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science. 2019;364(6444):0726.
2. Bird A. Dna methylation patterns and epigenetic memory. Genes Dev. 2002;16(1):6–21.
3. Herman JG, Baylin SB. Gene silencing in cancer in association with promoter hypermethylation. New Engl J Med. 2003;349(21):2042–54.
4. Sharma S, Kelly TK, Jones PA. Epigenetics in cancer. Carcinogenesis. 2010;31(1):27–36.
5. Luo H, Wei W, Ye Z, Zheng J, Xu R-h. Liquid biopsy of methylation biomarkers in cell-free dna. Trends Mol Med. 2021;27(5):482–500.