Apache Spark and MLlib-Based Intrusion Detection System or How the Big Data Technologies Can Secure the Data-Reference-Cited by-同舟云学术

Apache Spark and MLlib-Based Intrusion Detection System or How the Big Data Technologies Can Secure the Data

Published:2022-01-24 Issue:2 Volume:13 Page:58
ISSN:2078-2489
Container-title:Information
language:en
Short-container-title:Information

Author:

Azeroual Otmane^ORCID,Nikiforova Anastasija^ORCID

Abstract

Since the turn of the millennium, the volume of data has increased significantly in both industries and scientific institutions. The processing of these volumes and variety of data we are dealing with are unlikely to be accomplished with conventional software solutions. Thus, new technologies belonging to the big data processing area, able to distribute and process data in a scalable way, are integrated into classical Business Intelligence (BI) systems or replace them. Furthermore, we can benefit from big data technologies to gain knowledge about security, which can be obtained from massive databases. The paper presents a security-relevant data analysis based on the big data analytics engine Apache Spark. A prototype intrusion detection system is developed aimed at detecting data anomalies through machine learning by using the k-means algorithm for clustering analysis implemented in Sparks MLlib. The extraction of features to detect anomalies is currently challenging because the problem of detecting anomalies is not actively and exhaustively monitored. The detection of abnormal data can be effectuated by using relevant data that are already in companies’ and scientific organizations’ possession. Their interpretation and further processing in a continuous manner can sufficiently contribute to anomaly and intrusion detection.

Publisher

MDPI AG

Subject

Information Systems

Link

https://www.mdpi.com/2078-2489/13/2/58/pdf

Reference32 articles.

1. Industry 4.0, a revolution that requires technology and national strategies

2. A survey of emerging threats in cybersecurity

3. Metadata and Data Quality Problems in the Digital Library;Beall;J. Digit. Inf.,2005

4. Big Data Concepts, Theories, and Applications;Yu,2016

5. Praxishandbuch Big Data;Dorsche,2015

Cited by 20 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Application research of credit fraud detection based on distributed rotation deep forest;Intelligent Data Analysis;2024-07-17

2. Advancing big data clustering with fuzzy logic-based IMV-FCA and ensemble approach;IRAN J FUZZY SYST;2024

3. Money Laundering in the Age of Cybercrime and Emerging Technologies;Corruption, Bribery, and Money Laundering - Global Issues;2024-01-03

4. Cost based Random Forest Classifier for Intrusion Detection System in Internet of Things;Applied Soft Computing;2024-01

5. Recommender System with Apache Spark;Lecture Notes in Networks and Systems;2024