Affiliation:
1. Department of Statistics George Washington University Washington DC
2. Department of Statistics Purdue University West Lafayette Indiana
3. Department of Population Health New York University New York New York
Abstract
Estimating causal effects from large experimental and observational data has become increasingly prevalent in both industry and research. The bootstrap is an intuitive and powerful technique used to construct standard errors and confidence intervals of estimators. Its application however can be prohibitively demanding in settings involving large data. In addition, modern causal inference estimators based on machine learning and optimization techniques exacerbate the computational burden of the bootstrap. The bag of little bootstraps has been proposed in non‐causal settings for large data but has not yet been applied to evaluate the properties of estimators of causal effects. In this article, we introduce a new bootstrap algorithm called causal bag of little bootstraps for causal inference with large data. The new algorithm significantly improves the computational efficiency of the traditional bootstrap while providing consistent estimates and desirable confidence interval coverage. We describe its properties, provide practical considerations, and evaluate the performance of the proposed algorithm in terms of bias, coverage of the true 95% confidence intervals, and computational time in a simulation study. We apply it in the evaluation of the effect of hormone therapy on the average time to coronary heart disease using a large observational data set from the Women's Health Initiative.
Reference44 articles.
1. Online controlled experiments at large scale
2. PeysakhovichA LadaA.Combining observational and experimental data to find heterogeneous treatment effects. arXiv preprint arXiv:1611.02385 2016.
3. Overlapping experiment infrastructure
4. BNT162b2 mRNA Covid-19 Vaccine in a Nationwide Mass Vaccination Setting