Scalable collaborative targeted learning for high-dimensional data-Reference-Cited by-同舟云学术

Scalable collaborative targeted learning for high-dimensional data

Published:2017-09-22 Issue:2 Volume:28 Page:532-554
ISSN:0962-2802
Container-title:Statistical Methods in Medical Research
language:en
Short-container-title:Stat Methods Med Res

Author:

Ju Cheng¹^ORCID,Gruber Susan²,Lendle Samuel D¹,Chambaz Antoine¹³,Franklin Jessica M⁴,Wyss Richard⁴,Schneeweiss Sebastian⁴,van der Laan Mark J¹

Affiliation:

1. University of California, Berkeley, CA, USA

2. Harvard Pilgrim Health Care Institute and Harvard Medical School, Boston, MA, USA

3. Modal’X, UPL, Univ Paris Nanterre, Nanterre, France

4. Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Womens Hospital and Harvard Medical School, Boston, MA, USA

Abstract

Robust inference of a low-dimensional parameter in a large semi-parametric model relies on external estimators of infinite-dimensional features of the distribution of the data. Typically, only one of the latter is optimized for the sake of constructing a well-behaved estimator of the low-dimensional parameter of interest. Optimizing more than one of them for the sake of achieving a better bias-variance trade-off in the estimation of the parameter of interest is the core idea driving the general template of the collaborative targeted minimum loss-based estimation procedure. The original instantiation of the collaborative targeted minimum loss-based estimation template can be presented as a greedy forward stepwise collaborative targeted minimum loss-based estimation algorithm. It does not scale well when the number p of covariates increases drastically. This motivates the introduction of a novel instantiation of the collaborative targeted minimum loss-based estimation template where the covariates are pre-ordered. Its time complexity is [Formula: see text] as opposed to the original [Formula: see text], a remarkable gain. We propose two pre-ordering strategies and suggest a rule of thumb to develop other meaningful strategies. Because it is usually unclear a priori which pre-ordering strategy to choose, we also introduce another instantiation called SL-C-TMLE algorithm that enables the data-driven choice of the better pre-ordering strategy given the problem at hand. Its time complexity is [Formula: see text] as well. The computational burden and relative performance of these algorithms were compared in simulation studies involving fully synthetic data or partially synthetic data based on a real world large electronic health database; and in analyses of three real, large electronic health databases. In all analyses involving electronic health databases, the greedy collaborative targeted minimum loss-based estimation algorithm is unacceptably slow. Simulation studies seem to indicate that our scalable collaborative targeted minimum loss-based estimation and SL-C-TMLE algorithms work well. All C-TMLEs are publicly available in a Julia software package.

Publisher

SAGE Publications

Subject

Health Information Management,Statistics and Probability,Epidemiology

Link

http://journals.sagepub.com/doi/pdf/10.1177/0962280217729845

Reference36 articles.

1. Targeted Learning

2. Collaborative Double Robust Targeted Maximum Likelihood Estimation

3. Targeted Maximum Likelihood Estimation of Effect Modification Parameters in Survival Analysis

4. Finding quantitative trait loci genes with collaborative targeted maximum likelihood learning

5. Collaborative Targeted Maximum Likelihood for Time to Event Data

Cited by 23 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review;Annals of Epidemiology;2023-10

2. Causal Inference with Targeted Learning for Producing and Evaluating Real-World Evidence;Real-World Evidence in Medical Product Development;2023

3. Stan and BART for Causal Inference: Estimating Heterogeneous Treatment Effects Using the Power of Stan and the Flexibility of Machine Learning;Entropy;2022-12-06

4. High‐dimensional propensity scores for empirical covariate selection in secondary database studies: Planning, implementation, and reporting;Pharmacoepidemiology and Drug Safety;2022-11-22

5. Machine learning for improving high‐dimensional proxy confounder adjustment in healthcare database studies: An overview of the current literature;Pharmacoepidemiology and Drug Safety;2022-07-05