Affiliation:
1. Nanjing University of Information Science and Technology, Nanjing, Jiangsu, China
2. Southeast University, Nanjing, Jiangsu, China
Abstract
Many practical applications, such as social media and monitoring system, will constantly generate streaming data, which has problems of instability, lack of labels and multiclass imbalance. In order to solve these problems, a cluster-based active learning method is proposed to achieve data stream classification. Firstly, a label query strategy combining marginal threshold matrix is proposed, which selects difficult to classify or potential concept drift samples for marking, to solve the problem of high cost label and unbalanced data. Secondly, dynamic maintenance of a group of micro clusters, by adjusting the weight of micro clusters in the model, correctly reflects the current data distribution, and finally, uses the buffer to store new micro clusters to participate in the update of the model, to adapt to the new data environment. Experimental results on three real data sets and three synthetic data sets show that compared with the classical data stream classification algorithm, it is less affected by concept drift and has higher classification accuracy than the online semi-supervised learning algorithm ADSM. The average accuracy of the six datasets increased by 5.56%, 2.32%, 1.77%, 1.83%, 3.78%, and 2.04%, respectively. The model processes data streams online and improves classification performance with less memory consumption.
Funder
National Natural Science Foundation of China
Ant Group
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Theoretical Computer Science
Reference39 articles.
1. Manuel Baena-Garcıa, José del Campo-Ávila, Raúl Fidalgo, Albert Bifet, R. Gavalda, and Rafael Morales-Bueno. 2006. Early drift detection method. In Fourth International Workshop on Knowledge Discovery from Data Streams, Vol. 6. 77–86.
2. Maria-Florina Balcan and Ruth Urner. 2016. Active Learning-Modern Learning Theory.
3. Partially labeled data stream classification with the semi-supervised K-associated graph
4. Semi-supervised Learning with Concept Drift Using Particle Dynamics Applied to Network Intrusion Detection Data
5. Combining block-based and online methods in learning ensembles from concept drifting data streams
Cited by
8 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献