Abstract
AbstractCancer genome data has been growing in both size and complexity, primarily driven by advances in next-generation sequencing technologies, such as Pan-cancer data from TCGA, ICGC, and single-cell sequencing. Yet, discerning the functional role of individual genomic lesions remains a substantial challenge due to the complexity and scale of the data. Previously, we introduced REVEALER, which identifies groups of genomic alterations that significantly associate with target functional profiles or phenotypes, such as pathway activation, gene dependency, or drug response. In this paper, we present a new mathematical formulation of the algorithm. This version (REVEALER 2.0) is considerably more powerful than the original, allowing for rapid processing and analysis of much larger datasets and facilitating higher-resolution discoveries at the level of individual alleles. REVEALER 2.0 employs the Conditional Information Coefficient (CIC) to pinpoint features that are either complementary or mutually exclusive but still correlate with the target functional profile. The aggregation of these features provides a better explanation for the target functional profile than any single alteration on its own. This is indicative of scenarios where several activating genomic lesions can initiate or stimulate a key pathway or process. We replaced the initial three-dimensional kernel estimation with multiple precomputed one-dimensional kernel estimations, resulting in an approximate 150x increase in speed and efficiency. This improvement, combined with its efficient execution, makes REVEALER 2.0 suitable for much larger datasets and a more extensive range of genomic challenges.
Publisher
Cold Spring Harbor Laboratory