Regularized Bayesian transfer learning for population-level etiological distributions-Reference-Cited by-同舟云学术

Regularized Bayesian transfer learning for population-level etiological distributions

Published:2020-02-10 Issue:4 Volume:22 Page:836-857
ISSN:1465-4644
Container-title:Biostatistics
language:en
Short-container-title:

Author:

Datta Abhirup¹,Fiksel Jacob¹,Amouzou Agbessi²,Zeger Scott L¹

Affiliation:

1. Department of Biostatistics, Johns Hopkins University, 615 North Wolfe Street, Baltimore, MD 21205, USA

2. Department of International Health, Johns Hopkins University, 615 North Wolfe Street, Baltimore, MD 21205, USA

Abstract

Summary Computer-coded verbal autopsy (CCVA) algorithms predict cause of death from high-dimensional family questionnaire data (verbal autopsy) of a deceased individual, which are then aggregated to generate national and regional estimates of cause-specific mortality fractions. These estimates may be inaccurate if CCVA is trained on non-local training data different from the local population of interest. This problem is a special case of transfer learning, i.e., improving classification within a target domain (e.g., a particular population) with the classifier trained in a source-domain. Most transfer learning approaches concern individual-level (e.g., a person’s) classification. Social and health scientists such as epidemiologists are often more interested with understanding etiological distributions at the population-level. The sample sizes of their data sets are typically orders of magnitude smaller than those used for common transfer learning applications like image classification, document identification, etc. We present a parsimonious hierarchical Bayesian transfer learning framework to directly estimate population-level class probabilities in a target domain, using any baseline classifier trained on source-domain, and a small labeled target-domain dataset. To address small sample sizes, we introduce a novel shrinkage prior for the transfer error rates guaranteeing that, in absence of any labeled target-domain data or when the baseline classifier is perfectly accurate, our transfer learning agrees with direct aggregation of predictions from the baseline classifier, thereby subsuming the default practice as a special case. We then extend our approach to use an ensemble of baseline classifiers producing an unified estimate. Theoretical and empirical results demonstrate how the ensemble model favors the most accurate baseline classifier. We present data analyses demonstrating the utility of our approach.

Funder

Bill and Melinda Gates Foundation

National Institute of Aging

Publisher

Oxford University Press (OUP)

Subject

Statistics, Probability and Uncertainty,General Medicine,Statistics and Probability

Link

http://academic.oup.com/biostatistics/article-pdf/22/4/836/40579950/kxaa001.pdf

Reference36 articles.

1. Civil registration and vital statistics: progress in the data revolution for counting and accountability;AbouZahr,;The Lancet,2015

2. Let’s talk about death: data collection for verbal autopsies in a demographic and health surveillance site in Malaysia;Allotey,;Global Health Action,2015