Abstract
AbstractAs the available sequencing data modalities increase, so does the potential biological insight that they are able to provide. Most existing methods to integrate co-profiled single-cell multi-omics data focus only on learning representations that capture stationary and shared information among these modalities. Current methods do not account for time-dependent and modality-specific information delineating cell states and subtypes, nor do they consider dynamics resulting from causal relations among modalities. For example, open chromatin may cause active transcription; however, it is also possible that gene expression responses lag behind changes in chromatin accessibility. To account for this time lag, the epigenome and transcriptome relationship can be characterized as “coupled” (changing dependently) or “decoupled” (changing independently). We propose the framework HALO (Hierarchical cAusal representationLearning forOmics data), which adopts a causal approach to model these non-stationary causal relations using independent changing mechanisms in co-profiled single-cell ATAC- and RNA-seq data. Our model factorizes these two modalities into both coupled and decoupled latent representations, allowing us to identify the dynamic interplay between chromatin accessibility and transcription through temporal modulations. In blood lineage and developing mouse brain data, where the balance between proliferation and differentiation is tightly regulated, HALO distinguishes between coupled and decoupled genes and links them with disparate processes that constitute these two complementary states.
Publisher
Cold Spring Harbor Laboratory