Author:
Yang Jinliang,Lan Xuan,Huang Liansheng,Zeng Jigang
Abstract
Abstract
The technical requirements of behavior anomaly detection are higher and higher. Using the Internet of things technology combined with a variety of big data analysis algorithms, we can achieve accurate behavior anomaly detection by classifying behavior data sets to a large extent. In this paper, PLA - PRF (parallel random forest) algorithm is used to realize the behavior anomaly detection model of Internet of things integrating big data analysis. In behavior detection, the PRF algorithm and DFS algorithm are compared in the case of a different number of decision trees. The results show that, compared with DRF algorithm, PLA-PRF, SPARK MLRF(Spark Machine Learning Random Forests) and PRF algorithm perform better on the four datasets, with kappa values increased by about 3.13%, 2.56% and 1.98% respectively. In contrast, PLA-PRF algorithm has higher accuracy in the case of a small sample size. With the increase of sample size, the accuracy of behavior anomaly detection gradually decreases; because the algorithm is in subspace in the process of construction, some high pheromone features are abandoned, which makes the new spatial information of features insufficient, resulting in the decision tree training process does not learn the inherent laws of abandoned data. Compared with spark MLRF and DRF, PLA-PRF has a faster execution speed in large data sets, and with the increase of data volume, the advantage is more prominent. This is because PLA-PRF uses data reuse strategy "DRS" in the process of parallelization, which reduces the data communication overhead in a distributed environment and improves the parallelization efficiency of the algorithm.
Subject
General Physics and Astronomy