Author:
Duan Rui,Ning Yang,Chen Yong
Abstract
Summary
In multicentre research, individual-level data are often protected against sharing across sites. To overcome the barrier of data sharing, many distributed algorithms, which only require sharing aggregated information, have been developed. The existing distributed algorithms usually assume the data are homogeneously distributed across sites. This assumption ignores the important fact that the data collected at different sites may come from various subpopulations and environments, which can lead to heterogeneity in the distribution of the data. Ignoring the heterogeneity may lead to erroneous statistical inference. We propose distributed algorithms which account for the heterogeneous distributions by allowing site-specific nuisance parameters. The proposed methods extend the surrogate likelihood approach (Wang et al. 2017; Jordan et al. 2018) to the heterogeneous setting by applying a novel density ratio tilting method to the efficient score function. The proposed algorithms maintain the same communication cost as existing communication-efficient algorithms. We establish a nonasymptotic risk bound for the proposed distributed estimator and its limiting distribution in the two-index asymptotic setting, which allows both sample size per site and the number of sites to go to infinity. In addition, we show that the asymptotic variance of the estimator attains the Cramér–Rao lower bound when the number of sites is smaller in rate than the sample size at each site. Finally, we use simulation studies and a real data application to demonstrate the validity and feasibility of the proposed methods.
Publisher
Oxford University Press (OUP)
Subject
Applied Mathematics,Statistics, Probability and Uncertainty,General Agricultural and Biological Sciences,Agricultural and Biological Sciences (miscellaneous),General Mathematics,Statistics and Probability
Reference30 articles.
1. Privacy, confidentiality, and electronic medical records;Barrows,;J. Am. Med. Informatics Assoc.,1996
2. Distributed testing and estimation under sparse high dimensional models;Battey,;Ann. Statist.,2018
3. A split-and-conquer approach for analysis of extraordinarily large data;Chen,;Statist. Sinica,2014
4. Conducting multicenter research in healthcare simulation: Lessons learned from the inspire network;Cheng,;Adv. Simul.,2017
5. Meta-analysis in clinical trials;DerSimonian,;Contr. Clin. Trials,1986
Cited by
34 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献