Leveraging local data sampling strategies to improve federated learning-Reference-Cited by-同舟云学术

Leveraging local data sampling strategies to improve federated learning

Published:2024-08-29 Issue: Volume: Page:
ISSN:2364-415X
Container-title:International Journal of Data Science and Analytics
language:en
Short-container-title:Int J Data Sci Anal

Author:

Düsing Christoph,Cimiano Philipp,Paaßen Benjamin

Abstract

AbstractFederated learning (FL) facilitates shared training of machine learning models while maintaining data privacy. Unfortunately, it suffers from data imbalance among participating clients, causing the performance of the shared model to drop. To diminish the negative effects of unfavourable data-specific properties, both algorithm- and data-based approaches seek to make FL more resilient against them. In this regard, data-based approaches prove to be more versatile and require less domain knowledge to be applied efficiently. Hence, they seem particularly suitable for widespread application in various FL environments. Although data-based approaches such as local data sampling have been applied to FL in the past, previous research did not provide a systematic analysis of the potential and limitations of individual data sampling strategies to improve FL. To this end, we (1) identify relevant local data sampling strategies applicable to FL systems, (2) identify data-specific properties that negatively affect FL system performance, and (3) provide a benchmark of local data sampling strategies regarding their effect on model performance, convergence, and training time in synthetic, real-world, and large-scale FL environments. Moreover, we propose and rigorously test a novel method for data sampling in FL that locally optimizes the choice of sampling strategy prior to FL participation. Our results show that FL can greatly benefit from applying local data sampling in terms of performance and convergence rate, especially when data imbalance is high or the number of clients and samples is low. Furthermore, our proposed sampling strategy offers the best trade-off between model performance and training time.

Funder

Universität Bielefeld

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s41060-024-00625-7.pdf

Reference97 articles.

1. McMahan, B., Ramage, D.: Federated learning: collaborative machine learning without centralized training data. Google Research Blog (2017)

2. Truex, S., Baracaldo, N., Anwar, A., Steinke, T., Ludwig, H., Zhang, R., Zhou, Y.: A hybrid approach to privacy-preserving federated learning. In: Proceedings of the 12th ACM Workshop on Artificial Intelligence and Security, pp. 1–11 (2019)

3. Kairouz, P., McMahan, H.B., Avent, B., Bellet, A., Bennis, M., Bhagoji, A.N., Bonawitz, K., Charles, Z., Cormode, G., Cummings, R., et al.: Advances and open problems in federated learning. Found. Trends Mach. Learn. 14(1—-2), 1–210 (2021)

4. McMahan, B., Moore, E., Ramage, D., Hampson, S., Arcas, B.A.: Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics, pp. 1273–1282. PMLR (2017)

5. Du, Z., Sun, J., Li, A., Chen, P.-Y., Zhang, J., Li, H.H., Chen, Y.: Rethinking normalization methods in federated learning. In: Proceedings of the 3rd International Workshop on Distributed Machine Learning, pp. 16–22 (2022)