Affiliation:
1. Department of Mathematics, University of California San Diego, 9500 Gilman Drive, La Jolla, California 92093-0112, U.S.A
Abstract
Abstract
A fundamental challenge in semi-supervised learning lies in the observed data’s disproportional size when compared with the size of the data collected with missing outcomes. An implicit understanding is that the dataset with missing outcomes, being significantly larger, ought to improve estimation and inference. However, it is unclear to what extent this is correct. We illustrate one clear benefit: root-n inference of the outcome’s mean is possible while only requiring a consistent estimation of the outcome, possibly at a rate slower than root-n. This is achieved by a novel k-fold cross-fitted, double robust estimator. We discuss both linear and nonlinear outcomes. Such an estimator is particularly suited for models that naturally do not admit root-n consistency, such as high-dimensional, nonparametric, or semiparametric models. We apply our methods to the heterogeneous treatment effects.
Publisher
Oxford University Press (OUP)
Subject
Applied Mathematics,Statistics, Probability and Uncertainty,General Agricultural and Biological Sciences,Agricultural and Biological Sciences (miscellaneous),General Mathematics,Statistics and Probability
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献