Abstract
AbstractData from multiple organizations are crucial for advancing learning health systems. However, ethical, legal, and social concerns may restrict the use of standard statistical methods that rely on pooling data. Although distributed algorithms offer alternatives, they may not always be suitable for healthcare research frameworks. This paper aims to support researchers and data custodians in three ways: (1) providing a concise overview of the literature on statistical inference methods for horizontally partitioned data; (2) describing the methods applicable to generalized linear models (GLM) and assessing their underlying distributional assumptions; (3) adapting existing methods to make them fully usable in healthcare research. A scoping review methodology was employed for the literature mapping, from which methods presenting a methodological framework for GLM analyses with horizontally partitioned data were identified and assessed from the perspective of applicability in healthcare research. From the review, 41 articles were selected, and six approaches were extracted for conducting standard GLM-based statistical analysis. However, these approaches assumed evenly and identically distributed data across nodes. Consequently, statistical procedures were derived to accommodate uneven node sample sizes and heterogeneous data distributions across nodes. Workflows and detailed algorithms were developed to highlight information-sharing requirements and operational complexity.
Publisher
Cold Spring Harbor Laboratory
Reference61 articles.
1. Alan Agresti . Foundations of linear and generalized linear models. John Wiley & Sons, 2015.
2. E. Atta-Asiamah and M. Yuan . Distributed inference for degenerate u-statistics. Stat, 8(1), 2019.
3. Moulinath Banerjee , Cécile Durot , and Bodhisattva Sen . Divide and conquer in nonstandard problems and the super-efficiency phenomenon. The Annals of Statistics, 47(2), april 2019.
4. Robust, scalable, and fast bootstrap method for analyzing large scale data;IEEE Transactions on Signal Processing,2016