Abstract
AbstractEstimating dependencies from data is a fundamental task of Knowledge Discovery. Identifying the relevant variables leads to a better understanding of data and improves both the runtime and the outcomes of downstream Data Mining tasks. Dependency estimation from static numerical data has received much attention. However, real-world data often occurs as heterogeneous data streams: On the one hand, data is collected online and is virtually infinite. On the other hand, the various components of a stream may be of different types, e.g., numerical, ordinal or categorical. For this setting, we propose Monte Carlo Dependency Estimation (MCDE), a framework that quantifies multivariate dependency as the average statistical discrepancy between marginal and conditional distributions, via Monte Carlo simulations. MCDE handles heterogeneity by leveraging three statistical tests: the Mann–Whitney U, the Kolmogorov–Smirnov and the Chi-Squared test. We demonstrate that MCDE goes beyond the state of the art regarding dependency estimation by meeting a broad set of requirements. Finally, we show with a real-world use case that MCDE can discover useful patterns in heterogeneous data streams.
Funder
Deutsche Forschungsgemeinschaft
Bundesministerium für Bildung und Forschung
Publisher
Springer Science and Business Media LLC
Subject
Information Systems and Management,Hardware and Architecture,Information Systems,Software
Reference47 articles.
1. Barddal, J. P., Gomes, H. M., Enembreck, F.: A survey on feature drift adaptation. In: Proceedings of the ICTAI, pp. 1053–1060. IEEE Computer Society (2015)
2. Belghazi, M.I., Baratin, A., Rajeswar, S., Ozair, S., Bengio, Y., Hjelm, R.D., Courville, A.C.: Mutual information neural estimation. In: Proceedings of Machine Learning Research ICML, vol. 80, pp. 530–539. PMLR (2018)
3. Bellman, R.E.: Dynamic Programming. Princeton University Press, Princeton (1957)
4. Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the SDM, pp. 443–448. SIAM (2007)
5. Brunner, E., Munzel, U.: The nonparametric Behrens-fisher problem: asymptotic theory and a small-sample approximation. Biom. J. 42(1), 17–25 (2000)
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献