Abstract
AbstractMost research in machine learning for data streams has focused on classification algorithms, whereas regression methods have received a lot less attention. This paper proposes Self-Optimising K-Nearest Leaves (SOKNL), a novel forest-based algorithm for streaming regression problems. Specifically, the Adaptive Random Forest Regression, a state-of-the-art online regression algorithm is extended like this: in each leaf, a representative data point – also called centroid – is generated by compressing the information from all instances in that leaf. During the prediction step, instead of letting all trees in the forest participate, the distances between the input instance and all centroids from relevant leaves are calculated, only k trees that possess the smallest distances are utilised for the prediction. Furthermore, we simplify the algorithm by introducing a mechanism for tuning the k values, which is dynamically and automatically optimised based on historical information. This new algorithm produces promising predictive results and achieves a superior ranking according to statistical testing when compared with several standard stream regression methods over typical benchmark datasets. This improvement incurs only a small increase in runtime and memory consumption over the basic Adaptive Random Forest Regressor.
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Computer Science Applications,Information Systems
Reference36 articles.
1. Almeida E, Ferreira C, Gama J (2013) Adaptive model rules from data streams. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 480–492. Springer
2. Arthur D, Vassilvitskii S (2006) k-means++: The advantages of careful seeding. Technical report, Stanford
3. Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM International Conference on Data Mining, pp 443–448. SIAM
4. Bifet A, Holmes G, Kirkby R, Pfahringer B (2010) MOA: massive online analysis. J Mach Learn Res 11:1601–1604
5. Bifet A, Holmes G, Pfahringer B (2010) Leveraging bagging for evolving data streams. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp 135–150. Springer
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献