Abstract
ABSTRACTDecades before its clinical onset, epigenetic changes start to accumulate in the progenitor cells of Acute Myelogenous Leukemia (AML). Delineating these changes can improve risk-stratification for patients and shed insights into AML etiology, dynamics and mechanisms. Towards this goal, we extracted “epigenetic signatures” through two parallel machine learning approaches: a supervised regression model using frequently mutated genes as labels and an unsupervised topic modeling approach to factorize covarying epigenetic changes into a small number of “topics”. First, we created regression models forDNMT3AandTET2, the two most frequently mutated epigenetic drivers in AML. Our model differentiated wild-type vs. mutant genotypes based on their downstream epigenetic impacts with very high accuracy: AUROC 0.9 and 0.8, respectively. Methylation loci frequently selected by the models recapitulated known downstream pathways and identified several novel recurrent targets. Second, we used topic modeling to systematically factorize the high dimensional methylation profiles to a latent space of 15 topics. We annotated identified topics with biological and clinical features such as mutation status, prior malignancy and ELN criteria. Topic modeling successfully deconvoluted the combined effects of multiple upstream epigenetic drivers into individual topics including relatively infrequent cytogenetic events, improving the methylation-based subtyping of AML. Furthermore, they revealed complimentary and synergistic interactions between drivers, grouped them based on the similarity of their downstream methylation impact and linked them to prognostic criteria. Our models identify new signatures and methylation pathways, refine risk-stratification and inform detection and drug response studies for AML patients.KEY POINTSSupervised and unsupervised models reveal new methylation pathways of AML driver events and validate previously known associations.IndividualDNMT3AandTET2signatures are accurate and robust, despite the complex genetic and epigenetic make-up of samples at diagnosis.Unsupervised topic modeling factorizes covarying methylation changes and isolates methylation signatures caused by rare mutations.Topic modeling reveals a group of mutations with similar downstream methylation impacts and mapped to adverse-risk class by ELN.Topic modeling uncovers methylation signatures of infrequent cytogenetic events, significantly improving methylation-based subtyping.Our models can be leveraged to build predictive models for AML-risk.Our models show that cytogenetic events, such as t(15;17) have widespreadtransdownstream methylation impacts.
Publisher
Cold Spring Harbor Laboratory