Author:
Bareinboim Elias,Pearl Judea
Abstract
We review concepts, principles, and tools that unify current approaches to causal analysis and attend to new challenges presented by big data. In particular, we address the problem of data fusion—piecing together multiple datasets collected under heterogeneous conditions (i.e., different populations, regimes, and sampling methods) to obtain valid answers to queries of interest. The availability of multiple heterogeneous datasets presents new opportunities to big data analysts, because the knowledge that can be acquired from combined data would not be possible from any individual source alone. However, the biases that emerge in heterogeneous environments require new analytical tools. Some of these biases, including confounding, sampling selection, and cross-population biases, have been addressed in isolation, largely in restricted parametric models. We here present a general, nonparametric framework for handling these biases and, ultimately, a theoretical solution to the problem of data fusion in causal inference tasks.
Publisher
Proceedings of the National Academy of Sciences
Reference55 articles.
1. Pearl J (2009) Causality: Models, Reasoning, and Inference (Cambridge Univ Press, New York), 2nd Ed.
2. Causal inference in statistics: An overview
3. Pearl J Glymour M Jewell NP (2016) Causal Inference in Statistics: A Primer (Wiley, New York).
4. Greenland S Lash T (2008) in Bias Analysis in Modern Epidemiology, eds Rothman K Greenland S Lash T (Lippincott Williams & Wilkins, Philadelphia), 3rd Ed, pp 345–380.
Cited by
330 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献