Affiliation:
1. Department of Applied Mathematics, Computer Sciences and Statistics, Ghent University, Belgium
2. Department of Medical Statistics, London School of Hygiene and Tropical Medicine, UK
Abstract
The problem of how to best select variables for confounding adjustment forms one of the key challenges in the evaluation of exposure or treatment effects in observational studies. Routine practice is often based on stepwise selection procedures that use hypothesis testing, change-in-estimate assessments or the lasso, which have all been criticised for – amongst other things – not giving sufficient priority to the selection of confounders. This has prompted vigorous recent activity in developing procedures that prioritise the selection of confounders, while preventing the selection of so-called instrumental variables that are associated with exposure, but not outcome (after adjustment for the exposure). A major drawback of all these procedures is that there is no finite sample size at which they are guaranteed to deliver treatment effect estimators and associated confidence intervals with adequate performance. This is the result of the estimator jumping back and forth between different selected models, and standard confidence intervals ignoring the resulting model selection uncertainty. In this paper, we will develop insight into this by evaluating the finite-sample distribution of the exposure effect estimator in linear regression, under a number of the aforementioned confounder selection procedures. We will show that by making clever use of propensity scores, a simple and generic solution is obtained in the context of generalized linear models, which overcomes this concern (under weaker conditions than competing proposals). Specifically, we propose to use separate regularized regressions for the outcome and propensity score models in order to construct a doubly robust ‘g-estimator’; when these models are sufficiently sparse and correctly specified, standard confidence intervals for the g-estimator implicitly incorporate the uncertainty induced by the variable selection procedure.
Subject
Health Information Management,Statistics and Probability,Epidemiology
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献