Communication Efficiency and Non-Independent and Identically Distributed Data Challenge in Federated Learning: A Systematic Mapping Study

Author:

Alotaibi Basmah12ORCID,Khan Fakhri Alam134ORCID,Mahmood Sajjad13ORCID

Affiliation:

1. Department of Information and Computer Science, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia

2. Department of Computer Science, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh 13318, Saudi Arabia

3. Interdisciplinary Research Centre for Intelligent Secure Systems, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia

4. SDAIA-KFUPM Joint Research Center for Artificial Intelligence, King Fahd University of Petroleum and Minerals, Dhahran 31261, Saudi Arabia

Abstract

Federated learning has emerged as a promising approach for collaborative model training across distributed devices. Federated learning faces challenges such as Non-Independent and Identically Distributed (non-IID) data and communication challenges. This study aims to provide in-depth knowledge in the federated learning environment by identifying the most used techniques for overcoming non-IID data challenges and techniques that provide communication-efficient solutions in federated learning. The study highlights the most used non-IID data types, learning models, and datasets in federated learning. A systematic mapping study was performed using six digital libraries, and 193 studies were identified and analyzed after the inclusion and exclusion criteria were applied. We identified that enhancing the aggregation method and clustering are the most widely used techniques for non-IID data problems (used in 18% and 16% of the selected studies), and a quantization technique was the most common technique in studies that provide communication-efficient solutions in federated learning (used in 27% and 15% of the selected studies). Additionally, our work shows that label distribution skew is the most used case to simulate a non-IID environment, specifically, the quantity label imbalance. The supervised learning model CNN model is the most commonly used learning model, and the image datasets MNIST and Cifar-10 are the most widely used datasets when evaluating the proposed approaches. Furthermore, we believe the research community needs to consider the client’s limited resources and the importance of their updates when addressing non-IID and communication challenges to prevent the loss of valuable and unique information. The outcome of this systematic study will benefit federated learning users, researchers, and providers.

Funder

Saudi Data and AI Authority

King Fahd University of Petroleum and Minerals

Publisher

MDPI AG

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3