High performance logistic regression for privacy-preserving genome analysis-Reference-Cited by-同舟云学术

High performance logistic regression for privacy-preserving genome analysis

Published:2021-01-20 Issue:1 Volume:14 Page:
ISSN:1755-8794
Container-title:BMC Medical Genomics
language:en
Short-container-title:BMC Med Genomics

Author:

De Cock Martine^ORCID,Dowsley Rafael,Nascimento Anderson C. A.,Railsback Davis,Shen Jianwei,Todoki Ariel

Abstract

Abstract Background In biomedical applications, valuable data is often split between owners who cannot openly share the data because of privacy regulations and concerns. Training machine learning models on the joint data without violating privacy is a major technology challenge that can be addressed by combining techniques from machine learning and cryptography. When collaboratively training machine learning models with the cryptographic technique named secure multi-party computation, the price paid for keeping the data of the owners private is an increase in computational cost and runtime. A careful choice of machine learning techniques, algorithmic and implementation optimizations are a necessity to enable practical secure machine learning over distributed data sets. Such optimizations can be tailored to the kind of data and Machine Learning problem at hand. Methods Our setup involves secure two-party computation protocols, along with a trusted initializer that distributes correlated randomness to the two computing parties. We use a gradient descent based algorithm for training a logistic regression like model with a clipped ReLu activation function, and we break down the algorithm into corresponding cryptographic protocols. Our main contributions are a new protocol for computing the activation function that requires neither secure comparison protocols nor Yao’s garbled circuits, and a series of cryptographic engineering optimizations to improve the performance. Results For our largest gene expression data set, we train a model that requires over 7 billion secure multiplications; the training completes in about 26.90 s in a local area network. The implementation in this work is a further optimized version of the implementation with which we won first place in Track 4 of the iDASH 2019 secure genome analysis competition. Conclusions In this paper, we present a secure logistic regression training protocol and its implementation, with a new subprotocol to securely compute the activation function. To the best of our knowledge, we present the fastest existing secure multi-party computation implementation for training logistic regression models on high dimensional genome data distributed across a local area network.

Publisher

Springer Science and Business Media LLC

Subject

Genetics(clinical),Genetics

Link

http://link.springer.com/content/pdf/10.1186/s12920-020-00869-9.pdf

Reference40 articles.

1. Jagadeesh KA, Wu DJ, Birgmeier JA, Boneh D, Bejerano G. Deriving genomic diagnoses without revealing patient genomes. Science. 2017;357(6352):692–5.

2. Mohassel P, Zhang Y. SecureML: a system for scalable privacy-preserving machine learning. In: 2017 IEEE symposium on security and privacy (SP); 2017; p. 19–38.

3. Schoppmann P, Gascón A, Raykova M, Pinkas B. Make some room for the zeros: data sparsity in secure distributed machine learning. In: Proceedings of the 2019 ACM SIGSAC conference on computer and communications security; 2019; p. 1335–50.

4. De Cock M, Dowsley R, Nascimento A, Railsback D, Shen J, Todoki A. Fast secure logistic regression for high dimensional gene data. In: Privacy in machine learning (PriML2019). Workshop at NeurIPS; 2019; p. 1–7.

5. Bonte C, Vercauteren F. Privacy-preserving logistic regression training. BMC Med Genomics. 2018;11(4):86.

Cited by 35 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Privacy-preserving logistic regression with improved efficiency;Journal of Information Security and Applications;2024-09

2. Privacy-preserving multi-party logistic regression in cloud computing;Computer Standards & Interfaces;2024-08

3. VPPLR: Privacy-preserving logistic regression on vertically partitioned data using vectorization sharing;Journal of Information Security and Applications;2024-05

4. Advancing IoT security: A systematic review of machine learning approaches for the detection of IoT botnets;Journal of King Saud University - Computer and Information Sciences;2023-12

5. Confidential Training and Inference using Secure Multi-Party Computation on Vertically Partitioned Dataset;Scalable Computing: Practice and Experience;2023-11-17