Affiliation:
1. College of Systems Engineering, National University of Defense Technology, Changsha 410000, China
2. Software College, Northeastern University, Shenyang 110000, China
Abstract
Due to the rise of many fields such as e-commerce platforms, a large number of stream data has emerged. The incomplete labeling problem and concept drift problem of these data pose a huge challenge to the existing stream data classification methods. In this respect, a dynamic stream data classification algorithm is proposed for the stream data. For the incomplete labeling problem, this method introduces randomization and iterative strategy based on the very fast decision tree VFDT algorithm to design an iterative integration algorithm, and the algorithm uses the previous model classification result as the next model input and implements the voting mechanism for new data classification. At the same time, the window mechanism is used to store data and calculate the data distribution characteristics in the window, then, combined with the calculated result and the predicted amount of data to adjust the size of the sliding window. Experiments show the superiority of the algorithm in classification accuracy. The aim of the study is to compare different algorithms to evaluate whether classification model adapts to the current data environment.
Funder
National Natural Science Foundation of China
Subject
Computer Networks and Communications,Computer Science Applications
Reference18 articles.
1. YuZ.Research on Related Issues of Massive Data Mining2015M.S. thesis
2. Mining high-speed data streams;P. Domingos
3. Probability inequalities for sums of bounded random variables;W. Hoeffding;Journal of the American Statistical Association,1962
4. Accurate decision trees for mining high-speed data streams;J. Gama
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献