Affiliation:
1. Key Laboratory of Intelligent Computing and Signal Processing (Anhui University), Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, China
2. Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology), Ministry of Education, Hefei University of Technology, Hefei, China
Abstract
The explosion of data volume has gradually transformed big data processing from the static batch mode to the online streaming model. Streaming data can be divided into instance streams (feature space remains fixed while instances increase over time), feature streams (instance space is fixed while features arrive over time), or both. Generally, online streaming data learning has two main challenges: infinite length and concept changing. Recently, feature stream learning has received much attention. However, existing feature stream learning methods focus on feature selection or classification but ignore the concept changing over time. To the best of our knowledge, this is the first work that studies concept evolution detection over feature streams. Specifically, we first give the formal definition of concept evolution over feature streams, which include three different types: concept emerging, concept drift, and concept forgetting. Then, we design a novel framework to detect the concept evolution over feature streams that consists of a sliding window, an improved density peak-based clustering algorithm, and a weighted bipartite graph-based concept detecting method. Extensive experiments have been conducted on several synthetic and high-dimensional datasets to indicate our new method’s ability to cluster and detect concept evolution over feature streams.
Funder
National Natural Science Foundation of China
Science Foundation of Anhui Province of China
Publisher
Association for Computing Machinery (ACM)
Reference61 articles.
1. David Arthur and Sergei Vassilvitskii. 2007. K-means++ the advantages of careful seeding. In Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms. 1027–1035.
2. Data stream analysis: Foundations, major tasks and tools
3. Lessons for big-data projects;Birney Ewan;Nature,2012
4. A disease diagnosis and treatment recommendation system based on big data mining and cloud computing