Affiliation:
1. Department of Biostatistics Yale School of Public Health New Haven USA
2. Department of Biostatistics, Department of Management Sciences, and School of Data Science City University of Hong Kong Hong Kong China
3. Faculty of Economics and Management East China Normal University Shanghai China
Abstract
Sufficient dimension reduction (SDR) methods are effective tools for handling high dimensional data. Classical SDR methods are developed under the assumption that the data are completely observed. When the data are incomplete due to missing values, SDR has only been considered when the data are randomly missing, but not when they are nonignorably missing, which is arguably more difficult to handle due to the missing values' dependence on the reasons they are missing. The purpose of this paper is to fill this void. We propose an intuitive, easy‐to‐implement SDR estimator based on a semiparametric propensity score function for response data with non‐ignorable missing values. We refer to it as the dimension reduction‐based imputed estimator. We establish the theoretical properties of this estimator and examine its empirical performance via an extensive numerical study on real and simulated data. As well, we compare the performance of our proposed dimension reduction‐based imputed estimator with two competing estimators, including the fusion refined estimator and cumulative slicing estimator. A distinguishing feature of our method is that it requires no validation sample. The SDR theory developed in this paper is a non‐trivial extension of the existing literature, due to the technical challenges posed by nonignorable missingness. All the technical proofs of the theorems are given in the Appendix S1.
Funder
National Key Research and Development Program of China
National Natural Science Foundation of China
Subject
Statistics, Probability and Uncertainty,Statistics and Probability