Affiliation:
1. Department of Mathematical Sciences Michigan Technological University Houghton Michigan USA
2. Department of Statistics Virginia Polytechnic Institute and State University Blacksburg Virginia USA
Abstract
Variable selection and graphical modeling play essential roles in highly correlated and high‐dimensional (HCHD) data analysis. Variable selection methods have been developed under both parametric and nonparametric model settings. However, variable selection for nonadditive, nonparametric regression with high‐dimensional variables is challenging due to complications in modeling unknown dependence structures among HCHD variables. Gaussian graphical models are a popular and useful tool for investigating the conditional dependence between variables via estimating sparse precision matrices. For a given class of interest, the estimated precision matrices can be mapped onto networks for visualization. However, the limitation of Gaussian graphical models is that they are only applicable to discretized response variables and for the case when , where is the number of variables and is the sample size. They are necessary to develop a joint method for variable selection and graphical modeling. To the best of our knowledge, the methods for simultaneously selecting variable selection and estimating networks among variables in the semiparametric regression settings are quite limited. Hence, in this paper, we develop a joint semiparametric kernel network regression method to solve this limitation and to provide a connection between them. Our approach is a unified and integrated method that can simultaneously identify important variables and build a network among those variables. We developed our approach under a semiparametric kernel machine regression framework, which can allow for nonlinear or nonadditive associations and complicated interactions among the variables. The advantages of our approach are that it can (1) simultaneously select variables and build a network among HCHD variables under a regression setting; (2) model unknown and complicated interactions among the variables and estimate the network among these variables; (3) allow for any form of semiparametric model, including non‐additive, nonparametric model; and (4) provide an interpretable network that considers important variables and a response variable. We demonstrate our approach using a simulation study and real application on genetic pathway‐based analysis.
Funder
National Science Foundation of Sri Lanka
Subject
Statistics and Probability,Epidemiology