Solving the missing at random problem in semi‐supervised learning: An inverse probability weighting method
Author:
Su Jin1ORCID,
Zhang Shuyi1,
Zhou Yong1
Affiliation:
1. Key Laboratory of Advanced Theory and Application in Statistics and Data Science, Ministry of Education, School of Statistics, Academy of Statistics and Interdisciplinary Sciences East China Normal University Shanghai China
Abstract
We propose an estimator for the population mean under the semi‐supervised learning setting with the Missing at Random (MAR) assumption. This setting assumes that the probability of observing , denoted by , depends on the total sample size and satisfies . To efficiently estimate , we introduce an adaptive estimator based on inverse probability weighting and cross‐fitting. Theoretical analysis reveals that our proposed estimator is consistent and efficient, with a convergence rate of , slower than the typical rate, due to the diminishing proportion of labelled data as the sample size increases in the semi‐supervised setting. We also prove the consistency of inverse probability weighting (IPW)–Nadaraya–Watson density function estimators. Extensive simulations and an application to the Los Angeles homeless data validate the effectiveness of our approach.
Funder
National Key Research and Development Program of China
National Natural Science Foundation of China
Science and Technology Commission of Shanghai Municipality
National Outstanding Youth Science Fund Project of National Natural Science Foundation of China
Natural Science Foundation of Shanghai Municipality
Reference40 articles.
1. Sparse single‐index model;Alquier P.;Journal of Machine Learning Research,2013
2. A framework for learning predictive structures from multiple tasks and unlabeled data;Ando R. K.;Journal of Machine Learning Research,2005
3. Semi-Supervised Linear Regression
4. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples;Belkin M.;Journal of Machine Learning Research,2006
5. Two-step semiparametric empirical likelihood inference