Clustering-based Active Learning Classification towards Data Stream

Author:

Yin Chunyong1ORCID,Chen Shuangshuang1ORCID,Yin Zhichao2ORCID

Affiliation:

1. Nanjing University of Information Science and Technology, Nanjing, Jiangsu, China

2. Southeast University, Nanjing, Jiangsu, China

Abstract

Many practical applications, such as social media and monitoring system, will constantly generate streaming data, which has problems of instability, lack of labels and multiclass imbalance. In order to solve these problems, a cluster-based active learning method is proposed to achieve data stream classification. Firstly, a label query strategy combining marginal threshold matrix is proposed, which selects difficult to classify or potential concept drift samples for marking, to solve the problem of high cost label and unbalanced data. Secondly, dynamic maintenance of a group of micro clusters, by adjusting the weight of micro clusters in the model, correctly reflects the current data distribution, and finally, uses the buffer to store new micro clusters to participate in the update of the model, to adapt to the new data environment. Experimental results on three real data sets and three synthetic data sets show that compared with the classical data stream classification algorithm, it is less affected by concept drift and has higher classification accuracy than the online semi-supervised learning algorithm ADSM. The average accuracy of the six datasets increased by 5.56%, 2.32%, 1.77%, 1.83%, 3.78%, and 2.04%, respectively. The model processes data streams online and improves classification performance with less memory consumption.

Funder

National Natural Science Foundation of China

Ant Group

Publisher

Association for Computing Machinery (ACM)

Subject

Artificial Intelligence,Theoretical Computer Science

Reference39 articles.

1. Manuel Baena-Garcıa, José del Campo-Ávila, Raúl Fidalgo, Albert Bifet, R. Gavalda, and Rafael Morales-Bueno. 2006. Early drift detection method. In Fourth International Workshop on Knowledge Discovery from Data Streams, Vol. 6. 77–86.

2. Maria-Florina Balcan and Ruth Urner. 2016. Active Learning-Modern Learning Theory.

3. Partially labeled data stream classification with the semi-supervised K-associated graph

4. Semi-supervised Learning with Concept Drift Using Particle Dynamics Applied to Network Intrusion Detection Data

5. Combining block-based and online methods in learning ensembles from concept drifting data streams

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3