Abstract
AbstractMany cancer genomes are extensively rearranged with highly aberrant chromosomal karyotypes. Structural and copy number variations in cancer genomes can be determined via abnormal mapping of sequenced reads to the reference genome. Recently it became possible to reconcile both of these types of large-scale variations into a karyotype graph representation of the rearranged cancer genomes. Such a representation, however, does not directly describe the linear and/or circular structure of the underlying rearranged cancer chromosomes, thus limiting possible analysis of cancer genomes somatic evolutionary process as well as functional genomic changes brought by the large-scale genome rearrangements.Here we address the aforementioned limitation by introducing a novel methodological framework for recovering rearranged cancer chromosomes from karyotype graphs. For a cancer karyotype graph we formulate an Eulerian Decomposition Problem (EDP) of finding a collection of linear and/or circular rearranged cancer chromosomes that are determined by the graph. We derive and prove computational complexities for several variations of the EDP. We then demonstrate that Eulerian decomposition of the cancer karyotype graphs is not always unique and present the Consistent Contig Covering Problem (CCCP) of recovering unambiguous cancer contigs from the cancer karyotype graph, and describe a novel algorithm CCR capable of solving CCCP in polynomial time.We apply CCR on a prostate cancer dataset and demonstrate that it is capable of consistently recovering large cancer contigs even when underlying cancer genomes are highly rearranged. CCR can recover rearranged cancer contigs from karyotype graphs thereby addressing existing limitation in inferring chromosomal structures of rearranged cancer genomes and advancing our understanding of both patient/cancer-specific as well as the overall genetic instability in cancer.
Publisher
Cold Spring Harbor Laboratory