Affiliation:
1. Department of Statistics and Data Science Cornell University Ithaca New York USA
2. Department of Biostatistics, Epidemiology and Informatics University of Pennsylvania Philadelphia Pennsylvania USA
3. Microsoft Research Redmond Washington USA
Abstract
ABSTRACTWe propose a communication‐efficient algorithm to estimate the average treatment effect (ATE), when the data are distributed across multiple sites and the number of covariates is possibly much larger than the sample size in each site. Our main idea is to calibrate the estimates of the propensity score and outcome models using some proper surrogate loss functions to approximately attain the desired covariate balancing property. We show that under possible model misspecification, our distributed covariate balancing propensity score estimator (disthdCBPS) can approximate the global estimator, obtained by pooling together the data from multiple sites, at a fast rate. Thus, our estimator remains consistent and asymptotically normal. In addition, when both the propensity score and the outcome models are correctly specified, the proposed estimator attains the semi‐parametric efficiency bound. We illustrate the empirical performance of the proposed method in both simulation and empirical studies.
Funder
National Institutes of Health
National Science Foundation
Reference28 articles.
1. Distributed testing and estimation under sparse high dimensional models
2. Least squares after model selection in high-dimensional sparse models
3. Program Evaluation and Causal Inference With High-Dimensional Data
4. Bradic J. S.Wager andY.Zhu. (2019). “Sparsity Double Robust Inference of Average Treatment Effects.” arXiv Preprint arXiv:1905.00744.
5. Center of Disease Control and Prevention. (2021). “Evaluating and Caring for Patients With Post‐COVID Conditions: Interim Guidance.”