Affiliation:
1. Guangxi Key Lab of Multi-source Information Mining and Security, Guangxi Normal University, Guilin, China
2. College of Computer Science and Engineering, Guangxi Normal University, Guilin, China
Abstract
At present, gradient boosting decision trees (GBDTs) has become a popular machine learning algorithm and has shined in many data mining competitions and real-world applications for its salient results on classification, ranking, prediction, etc. Federated learning which aims to mitigate privacy risks and costs, enables many entities to keep data locally and train a model collaboratively under an orchestration service. However, most of the existing systems often fail to make an excellent trade-off between accuracy and communication. In addition, they overlook an important aspect: fairness such as performance gains from different parties’ datasets. In this paper, we propose a novel federated GBDT scheme based on the blockchain which can achieve constant communication overhead and good model performance and quantify the contribution of each party. Specifically, we replace the tree-based communication scheme with the pure gradient-based scheme and compress the intermediate gradient information to a limit to achieve good model performance and constant communication overhead in skewed datasets. On the other hand, we introduce a novel contribution allocation scheme named split Shapley value, which can quantify the contribution of each party with a limited gradient update and provide a basis for monetary reward. Finally, we combine the quantification mechanism with blockchain organically and implement a closed-loop federated GBDT system FGBDT-Chain in a permissioned blockchain environment and conduct a comprehensive experiment on public datasets. The experimental results show that FGBDT-Chain achieves a good trade-off between accuracy, communication overhead, fairness, and security under large-scale skewed datasets.
Funder
National Natural Science Foundation of China
Subject
Computer Networks and Communications,Information Systems
Reference35 articles.
1. Communication-efficient learning of deep networks from decentralized data;H. B. McMahan,2017
2. A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and protection;Q. Li,2019
3. Xgboost: a scalable tree boosting system;T. Chen
4. Lightgbm: a highly efficient gradient boosting decision tree;G. Ke;Advances in Neural Information Processing Systems,2017
5. Using Random forest and Gradient boosting trees to improve wave forecast at a specific location
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献