Non-plug-in estimators could outperform plug-in estimators: a cautionary note and a diagnosis-Reference-Cited by-同舟云学术

Non-plug-in estimators could outperform plug-in estimators: a cautionary note and a diagnosis

Published:2024-01-01 Issue:1 Volume:13 Page:
ISSN:2161-962X
Container-title:Epidemiologic Methods
language:en
Short-container-title:

Author:

Qiu Hongxiang¹^ORCID

Affiliation:

1. Department of Epidemiology and Biostatistics , 12268 Michigan State University , East Lansing , MI , USA

Abstract

Abstract Objectives Highly flexible nonparametric estimators have gained popularity in causal inference and epidemiology. Popular examples of such estimators include targeted maximum likelihood estimators (TMLE) and double machine learning (DML). TMLE is often argued or suggested to be better than DML estimators and several other estimators in small to moderate samples – even if they share the same large-sample properties – because TMLE is a plug-in estimator and respects the known bounds on the parameter, while other estimators might fall outside the known bounds and yield absurd estimates. However, this argument is not a rigorously proven result and may fail in certain cases. Methods In a carefully chosen simulation setting, I compare the performance of several versions of TMLE and DML estimators of the average treatment effect among treated in small to moderate samples. Results In this simulation setting, DML estimators outperforms some versions of TMLE in small samples. TMLE fluctuations are unstable, and hence empirically checking the magnitude of the TMLE fluctuation might alert cases where TMLE might perform poorly. Conclusions As a plug-in estimator, TMLE is not guaranteed to outperform non-plug-in counterparts such as DML estimators in small samples. Checking the fluctuation magnitude might be a useful diagnosis for TMLE. More rigorous theoretical justification is needed to understand and compare the finite-sample performance of these highly flexible estimators in general.

Publisher

Walter de Gruyter GmbH

Link

https://www.degruyter.com/document/doi/10.1515/em-2024-0008/pdf

Reference44 articles.

1. Robins, JM, Rotnitzky, A, Zhao, LP. Estimation of regression coefficients when some regressors are not always observed. J Am Stat Assoc 1994;89:846–66. https://doi.org/10.1080/01621459.1994.10476818.

2. Robins, JM, Rotnitzky, A, Zhao, LP. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J Am Stat Assoc 1995;90:106–21. https://doi.org/10.1080/01621459.1995.10476493.

3. Bickel, PJ, Klaassen, CA, Ritov, Y, Wellner, JA. Efficient and adaptive estimation for semiparametric models. New York, NY: Springer; 1993, 4.

4. Pfanzagl, J. Contributions to a general asymptotic statistical theory, volume 3 of lecture notes in statistics. New York, NY: Springer; 1985.

5. Van der Laan, MJ, Rose, S. Targeted learning in data science. Cham: Springer; 2018.