Affiliation:
1. Department of Mathematics Wellesley College Wellesley Massachusetts USA
Abstract
Mediation analysis intends to unveil the underlying relationship between an outcome variable and an exposure variable through one or more intermediate variables called mediators. In recent decades, research on mediation analysis has been focusing on multivariate mediation models, where the number of mediating variables is possibly of high dimension. This paper concerns high‐dimensional mediation analysis and proposes a three‐step algorithm that extracts and utilizes inter‐connectivity among candidate mediators. More specifically, the proposed methodology starts with a screening procedure to reduce the dimensionality of the initial set of candidate mediators, followed by a penalized regression model that incorporates both parameter‐ and group‐wise regularization, and ends with fitting a multivariate mediation model and identifying active mediating variables through a joint significance test. To showcase the performance of the proposed algorithm, we conducted two simulation studies in high‐dimensional and ultra‐high‐dimensional settings, respectively. Furthermore, we demonstrate the practical applications of the proposal using a real data set that uncovers the possible impact of environmental toxicants on women's gestational age at delivery through 61 biomarkers that belong to 7 biological pathways.