Affiliation:
1. Institute of Statistics and Big Data, Renmin University of China , Beijing , China
2. Department of Statistics, Texas A&M University , College Station, Texas , USA
Abstract
Abstract
Bayesian networks have been widely used to generate causal hypotheses from multivariate data. Despite their popularity, the vast majority of existing causal discovery approaches make the strong assumption of a (partially) homogeneous sampling scheme. However, such assumption can be seriously violated, causing significant biases when the underlying population is inherently heterogeneous. To this end, we propose a novel causal Bayesian network model, termed BN-LTE, that embeds heterogeneous samples onto a low-dimensional manifold and builds Bayesian networks conditional on the embedding. This new framework allows for more precise network inference by improving the estimation resolution from the population level to the observation level. Moreover, while causal Bayesian networks are in general not identifiable with purely observational, cross-sectional data due to Markov equivalence, with the blessing of causal effect heterogeneity, we prove that the proposed BN-LTE is uniquely identifiable under relatively mild assumptions. Through extensive experiments, we demonstrate the superior performance of BN-LTE in causal structure learning as well as inferring observation-specific gene regulatory networks from observational data.
Funder
National Institute of General Medical Sciences
National Natural Science Foundation of China
National Science Foundation
Publisher
Oxford University Press (OUP)
Subject
Applied Mathematics,General Agricultural and Biological Sciences,General Immunology and Microbiology,General Biochemistry, Genetics and Molecular Biology,General Medicine,Statistics and Probability