Nonparametric targeted Bayesian estimation of class proportions in unlabeled data-Reference-Cited by-同舟云学术

Nonparametric targeted Bayesian estimation of class proportions in unlabeled data

Published:2020-06-11 Issue:1 Volume:23 Page:274-293
ISSN:1465-4644
Container-title:Biostatistics
language:en
Short-container-title:

Author:

Díaz Iván¹,Savenkov Oleksander¹,Kamel Hooman²

Affiliation:

1. Division of Biostatistics, Weill Cornell Medicine, New York, NY 10065, USA

2. Department of Neurology, Weill Cornell Medicine, New York, NY 10065, USA

Abstract

Summary We introduce a novel Bayesian estimator for the class proportion in an unlabeled dataset, based on the targeted learning framework. The procedure requires the specification of a prior (and outputs a posterior) only for the target of inference, and yields a tightly concentrated posterior. When the scientific question can be characterized by a low-dimensional parameter functional, this focus on target prior and posterior distributions perfectly aligns with Bayesian subjectivism. We prove a Bernstein–von Mises-type result for our proposed Bayesian procedure, which guarantees that the posterior distribution converges to the distribution of an efficient, asymptotically linear estimator. In particular, the posterior is Gaussian, doubly robust, and efficient in the limit, under the only assumption that certain nuisance parameters are estimated at slower-than-parametric rates. We perform numerical studies illustrating the frequentist properties of the method. We also illustrate their use in a motivating application to estimate the proportion of embolic strokes of undetermined source arising from occult cardiac sources or large-artery atherosclerotic lesions. Though we focus on the motivating example of the proportion of cases in an unlabeled dataset, the procedure is general and can be adapted to estimate any pathwise differentiable parameter in a non-parametric model.

Publisher

Oxford University Press (OUP)

Subject

Statistics, Probability and Uncertainty,General Medicine,Statistics and Probability

Link

https://academic.oup.com/biostatistics/article-pdf/23/1/274/42208880/kxaa022.pdf

Reference77 articles.

1. Combining expert opinions in prior elicitation;Albert,;Bayesian Analysis,2012

2. Doubly robust estimation in missing data and causal inference models;Bang,;Biometrics,2005

3. and others Information and asymptotic efficiency in parametric-nonparametric models;Begun,;The Annals of Statistics,1983

4. The highly adaptive lasso estimator;Benkeser,,2016

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Machine Learning Prediction of Stroke Mechanism in Embolic Strokes of Undetermined Source;Stroke;2020-09