Abstract
ABSTRACTSharing data across multiple institutions for genome-wide association studies (GWAS) would enable discovery of novel genetic variants linked to health and disease. However, existing regulations on genomic data sharing and the sheer size of the data limit the scope of such collaborations. Although cryptographic tools for secure computation promise to enable collaborative studies with formal privacy guarantees, existing approaches either are computationally impractical or support only simplified analysis pipelines. Here, we introduce secure and federated genome-wide association studies (SF-GWAS), a novel combination of secure computation frameworks that empowers efficient and accurate GWAS in a federated manner, i.e., on private data locally-held by multiple entities, while provably ensuring end-to-end data confidentiality. Another key advance is that we designed SF-GWAS to support the two most widely-used GWAS pipelines—those based on principal component analysis (PCA) or linear mixed models (LMMs). We ran SF-GWAS on five real GWAS datasets, including a large UK Biobank cohort of 410K individuals, thereby demonstrating the largest secure genetics collaboration to date. SF-GWAS achieves an order-of-magnitude runtime improvement over the prior art for PCA-based GWAS and newly allows secure LMM-based association tests, for which its runtime scales at a near-constant rate in cohort size. Our work realizes the power of secure, collaborative, and accurate GWAS at unprecedented scale and should be applicable to a broad range of analyses. Our open-source software is at:https://github.com/hhcho/sfgwas.
Publisher
Cold Spring Harbor Laboratory
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献