Affiliation:
1. University of Technology Sydney, Australia
Abstract
Online data stream mining is of great significance in practice because of its ubiquity in many real-world scenarios, especially in the big data era. Traditional data mining algorithms cannot be directly applied to data streams due to (1) the possible change of underlying data distribution over time (i.e.,
concept drift
) and (2) delayed, short, or even no labels for streaming data in practice. A new research area, named
unsupervised concept drift detection
, has emerged to tackle this difficulty mainly based on two-sample hypothesis tests, such as the Kolmogorov–Smirnov test. However, it is surprising that none of the existing methods in this area exploit the Bayesian nonparametric hypothesis test, which has clear interpretability and straightforward prior knowledge encoding ability and no strict or unrealistic requirement of prefixing the form for the underlying data distribution. In this article, we present a Bayesian nonparametric unsupervised concept drift detection method based on the Polya tree hypothesis test. The basic idea is to decompose the underlying data distribution into a multi-resolution representation that transforms the whole distribution hypothesis test into recursive and simple binomial tests. Also, an incremental mechanism is especially designed to improve its efficiency in the stream setting. The method effectively detect drifts, and it also locates where a drift happens and the posteriors of hypotheses. The experiments on synthetic data verify the desired properties of the proposed method, and the experiments on real-world data show the better performance of the method for data stream mining compared with its frequentist counterpart in the literature.
Funder
Australian Research Council
Publisher
Association for Computing Machinery (ACM)
Subject
Artificial Intelligence,Theoretical Computer Science
Reference57 articles.
1. Learning from Time-Changing Data with Adaptive Windowing
2. Online and Non-Parametric Drift Detection Methods Based on Hoeffding’s Bounds
3. Karsten M Borgwardt and Zoubin Ghahramani. 2009. Bayesian two-sample tests. arXiv:0906.4032. Retrieved from https://arxiv.org/abs/arXiv:0906.4032. Karsten M Borgwardt and Zoubin Ghahramani. 2009. Bayesian two-sample tests. arXiv:0906.4032. Retrieved from https://arxiv.org/abs/arXiv:0906.4032.
Cited by
18 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献