Concept Evolution Detecting over Feature Streams-Reference-Cited by-同舟云学术

Concept Evolution Detecting over Feature Streams

Published:2024-08-21 Issue:8 Volume:18 Page:1-32
ISSN:1556-4681
Container-title:ACM Transactions on Knowledge Discovery from Data
language:en
Short-container-title:ACM Trans. Knowl. Discov. Data

Author:

Zhou Peng¹^ORCID,Guo Yufeng¹^ORCID,Yu Haoran¹^ORCID,Yan Yuanting¹^ORCID,Zhang Yanping¹^ORCID,Wu Xindong²^ORCID

Affiliation:

1. Key Laboratory of Intelligent Computing and Signal Processing (Anhui University), Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, China

2. Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology), Ministry of Education, Hefei University of Technology, Hefei, China

Abstract

The explosion of data volume has gradually transformed big data processing from the static batch mode to the online streaming model. Streaming data can be divided into instance streams (feature space remains fixed while instances increase over time), feature streams (instance space is fixed while features arrive over time), or both. Generally, online streaming data learning has two main challenges: infinite length and concept changing. Recently, feature stream learning has received much attention. However, existing feature stream learning methods focus on feature selection or classification but ignore the concept changing over time. To the best of our knowledge, this is the first work that studies concept evolution detection over feature streams. Specifically, we first give the formal definition of concept evolution over feature streams, which include three different types: concept emerging, concept drift, and concept forgetting. Then, we design a novel framework to detect the concept evolution over feature streams that consists of a sliding window, an improved density peak-based clustering algorithm, and a weighted bipartite graph-based concept detecting method. Extensive experiments have been conducted on several synthetic and high-dimensional datasets to indicate our new method’s ability to cluster and detect concept evolution over feature streams.

Funder

National Natural Science Foundation of China

Science Foundation of Anhui Province of China

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3678012

Reference61 articles.

1. David Arthur and Sergei Vassilvitskii. 2007. K-means++ the advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. 1027–1035.

2. Data stream analysis: Foundations, major tasks and tools

3. Lessons for big-data projects;Birney Ewan;Nature,2012

4. A disease diagnosis and treatment recommendation system based on big data mining and cloud computing