Evaluation of Federated Learning in Phishing Email Detection

Author:

Thapa Chandra1ORCID,Tang Jun Wen2ORCID,Abuadbba Alsharif13ORCID,Gao Yansong1,Camtepe Seyit1ORCID,Nepal Surya13,Almashor Mahathir1ORCID,Zheng Yifeng4

Affiliation:

1. Commonwealth Scientific and Industrial Research Organisation, Data61, Sydney 2122, Australia

2. School of Chemical Engineering, The University of New South Wales, Sydney 2052, Australia

3. Cyber Security Cooperative Research Centre, Australian Capital Territory 2604, Australia

4. Harbin Institute of Technology, Harbin 150001, China

Abstract

The use of artificial intelligence (AI) to detect phishing emails is primarily dependent on large-scale centralized datasets, which has opened it up to a myriad of privacy, trust, and legal issues. Moreover, organizations have been loath to share emails, given the risk of leaking commercially sensitive information. Consequently, it has been difficult to obtain sufficient emails to train a global AI model efficiently. Accordingly, privacy-preserving distributed and collaborative machine learning, particularly federated learning (FL), is a desideratum. As it is already prevalent in the healthcare sector, questions remain regarding the effectiveness and efficacy of FL-based phishing detection within the context of multi-organization collaborations. To the best of our knowledge, the work herein was the first to investigate the use of FL in phishing email detection. This study focused on building upon a deep neural network model, particularly recurrent convolutional neural network (RNN) and bidirectional encoder representations from transformers (BERT), for phishing email detection. We analyzed the FL-entangled learning performance in various settings, including (i) a balanced and asymmetrical data distribution among organizations and (ii) scalability. Our results corroborated the comparable performance statistics of FL in phishing email detection to centralized learning for balanced datasets and low organizational counts. Moreover, we observed a variation in performance when increasing the organizational counts. For a fixed total email dataset, the global RNN-based model had a 1.8% accuracy decrease when the organizational counts were increased from 2 to 10. In contrast, BERT accuracy increased by 0.6% when increasing organizational counts from 2 to 5. However, if we increased the overall email dataset by introducing new organizations in the FL framework, the organizational level performance improved by achieving a faster convergence speed. In addition, FL suffered in its overall global model performance due to highly unstable outputs if the email dataset distribution was highly asymmetric.

Funder

Cyber Security Cooperative Research Centre

Publisher

MDPI AG

Subject

Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry

Reference70 articles.

1. Retruster Ltd (2021, January 24). 2019 Phishing Statistics and Email Fraud Statistics. Available online: https://retruster.com/blog/2019-phishing-and-email-fraud-statistics.html.

2. Mathews, L. (2021, February 07). Phishing Scams Cost American Businesses Half A Billion Dollars A Year. Available online: https://www.forbes.com/sites/leemathews/2017/05/05/phishing-scams-cost-american-businesses-half-a-billion-dollars-a-year/#133f645b3fa1.

3. Muncaster, P. (2021, February 08). COVID19 Drives Phishing Emails Up 667% in Under a Month. Available online: https://www.infosecurity-magazine.com/news/covid19-drive-phishing-emails-667?utm_source=twitterfeed&utm_medium=twitter.

4. Machine learning for email spam filtering: Review, approaches and open research problems;Dada;Heliyon,2019

5. Hiransha, M., Unnithan, N.A., Vinayakumar, R., Soman, K., and Verma, A.D.R. (2018, January 21). Deep Learning Based Phishing E-mail Detection CEN-Deepspam. Proceedings of the 1st AntiPhishing Shared Pilot at 4th ACM IWSPA, Tempe, AZ, USA.

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3