Abstract
To provide an accurate and timely response to different types of attacks, intrusion detection systems collect and analyze a large amount of data, which may include information with limited access, such as personal data or trade secrets. Consequently, such systems can be seen as an additional source of risks associated with handling sensitive information and breaching its security. Applying the federated learning paradigm to build analytical models for attack and anomaly detection can significantly reduce such risks because locally generated data is not transmitted to any third party, and model training is done locally - on the data sources. Using federated training for intrusion detection solves the problem of training on data that belongs to different organizations, and which, due to the need to protect commercial or other secrets, cannot be placed in the public domain. Thus, this approach also allows us to expand and diversify the set of data on which machine learning models are trained, thereby increasing the level of detectability of heterogeneous attacks. Due to the fact that this approach can overcome the aforementioned problems, it is actively used to design new approaches for intrusion and anomaly detection. The authors systematically explore existing solutions for intrusion and anomaly detection based on federated learning, study their advantages, and formulate open challenges associated with its application in practice. Particular attention is paid to the architecture of the proposed systems, the intrusion detection methods and models used, and approaches for modeling interactions between multiple system users and distributing data among them are discussed. The authors conclude by formulating open problems that need to be solved in order to apply federated learning-based intrusion detection systems in practice.
Subject
Artificial Intelligence,Applied Mathematics,Computational Theory and Mathematics,Computational Mathematics,Computer Networks and Communications,Information Systems
Reference111 articles.
1. McMahan B., Moore E., Ramage D., Hampson S., Arcas B.A. Communication-Efficient Learning of Deep Networks from Decentralized Data // Artificial intelligence and statistics. 2017. pp. 1273–1282.
2. Lwakatare L.E., Raj A., Bosch J., Olsson H.H., Crnkovic I.A Taxonomy of Software Engineering Challenges for Machine Learning Systems: An Empirical Investigation (Eds.: Kruchten P., Fraser S., Coallier F.) // Agile Processes in Software Engineering and Extreme Programming: Proceedings of 20th International Conference. 2019. pp. 227–243.
3. Antonakakis M., April T., Bailey M., Bernhard M., Bursztein E., Cochran J., Durumeric Z., Halderman J.A., Invernizzi L., Kallitsis M., Kumar D., Lever C., Ma Z., Mason J., Menscher D., Seaman C., Thomas K., Zhou Y. Understanding the Mirai Botnet // 26th USENIX Security Symposium (USENIX Security 17). 2017. pp. 1093–1110.
4. Novikova E., Doynikova E., Golubev S. Federated Learning for Intrusion Detection in the Critical Infrastructures: Vertically Partitioned Data Use Case // Algorithms. 2022. vol. 15(4). no. 104. DOI: 10.3390/a15040104.
5. Ludwig H, et al. IBM Federated Learning: an Enterprise Framework White Paper V0.1. ArXiv preprint arXiv:2007.10987. 2020.