Affiliation:
1. Hubei Key Laboratory of Applied Mathematics, School of Mathematics and Statistics Hubei University Wuhan China
Abstract
AbstractIn recent years, non‐probability samples, such as web survey samples, have become increasingly popular in many fields, but they may be subject to selection biases, which results in the difficulty for inference from them. Doubly robust (DR) estimation is one of the approaches to making inferences from non‐probability samples. When many covariates are available, variable selection becomes important in DR estimation. In this paper, a new DR estimator for the finite population mean is constructed, where the intertwined probabilistic factors decoupling (IPAD) and modified IPAD are used to select important variables in the propensity score model and the outcome superpopulation model, respectively. Unlike the traditional variable selection approaches, such as adaptive least absolute shrinkage and selection operator and smoothly clipped absolute deviations, IPAD and the modified IPAD not only can select important variables and estimate parameters, but also can control the false discovery rate, which can produce more accurate population estimators. Asymptotic theories and variance estimation of the DR estimator with a modified IPAD are established. Results from simulation studies indicate that our proposed estimator performs well. We apply the proposed method to the analysis of the Pew Research Center data and the Behavioral Risk Factor Surveillance System data.
Funder
National Social Science Fund of China
Subject
Computer Science Applications,Information Systems,Analysis
Reference32 articles.
1. Summary Report of the AAPOR Task Force on Non-probability Sampling
2. Controlling the false discovery rate via knockoffs
3. Controlling the false discovery rate: A practical and powerful approach to multiple testing;Benjamini Y.;J. R. Stat. Soc. Ser. B Methodol.,1995
4. J. M.BrickExplorations in non‐probability sampling using the web Proc. Conf. Tradit. Surv. Tak.: Adapt. Chang. World (2014) pp. 1–6.
5. Panning for gold: ‘model‐X’ knockoffs for high dimensional controlled variable selection