Author:
Dutta Diptavo,Sen Ananda,Satagopan Jaya
Abstract
AbstractBackgroundCopy number aberrations (CNA) have proved to be of clinical and therapeutic significance for many diseases including breast cancer, since they drive numerous key underlying biological processes, by regulating molecular phenotypes like gene expression and others. To comprehensively assess the effect of CNAs, it is not sufficient to only identify significant CNA-gene expression pairs, but also to identify the overall gene networks and regulatory structures that are influenced by CNAs, subsequently producing change in outcomes.MethodsIn this article, we adopt a two-step analysis approach to identify CNA regulated genes whose expression levels affect breast cancer related outcomes: (1) we identify gene modules that are regulated by CNAs through sparse canonical correlation analysis (sCCA) which selects a set of closely located CNAs that regulates the expression levels of selected genes. (2) then, we use a using generalized linear model, to identify which genes within the gene modules are associated with breast cancer related outcomes.ResultsAnalyzing clinical and genomic data on 1904 breast cancer patients from the METABRIC study, we found 14 gene modules to be regulated by groups of proximally located CNA sites. The identification of gene modules was further validated using independent data on individuals in a study of breast invasive carcinoma from The Cancer Genome Atlas (TCGA). Association analysis on 7 different breast cancer related outcomes identified several novel and interpretable regulatory associations which highlights how CNA can impact key biological pathways and process in context of breast cancer. Through downstream analysis of two example outcomes: estrogen receptor status and overall survival, we show that the identified genes were enriched in relevant biological pathways and the key advantage of our method is that we additionally identify the CNA that regulate these genes. Due to the availability of multiple types of outcomes, we further meta-analyzed the results to identify genes that had potentially associations with multiple outcomes.ConclusionsOverall we present a generalizable analysis approach to identify genes associated to different outcomes that are regulated by sets of CNA and can further be used to combine results across various types of outcomes. The results show that our method can identify novel and interpretable associations, by providing mechanistic insights on how the effects of CNA are cascaded via gene expression to impact breast cancer and related outcomes.
Publisher
Cold Spring Harbor Laboratory