Abstract
AbstractStream data mining aims to handle the continuous and ongoing generation of data flows (e.g. weather, stock and traffic data), which often encounters concept drift as time progresses. Traditional offline algorithms struggle with learning from real-time data, making online algorithms more fitting for mining the stream data with dynamic concepts. Among families of the online learning algorithms, single pass stands out for its efficiency in processing one sample point at a time, and inspecting it only once at most. Currently, there exist online algorithms tailored for single pass over the stream data by converting the problems of classification into minimum enclosing ball. However, these methods mainly focus on expanding the ball to enclose the new data. An excessively large ball might overwrite data of the new concept, creating difficulty in triggering the model updating process. This paper proposes a new online single pass framework for stream data mining, namely Scalable Concept Drift Adaptation (SCDA), and presents three distinct online methods (SCDA-I, SCDA-II and SCDA-III) based on that framework. These methods dynamically adjust the ball by expanding or contracting when new sample points arrive, thereby effectively avoiding the issue of excessively large balls. To evaluate their performance, we conduct the experiments on 7 synthetic and 5 real-world benchmark datasets and compete with the state-of-the-arts. The experiments demonstrate the applicability and flexibility of the SCDA methods in stream data mining by comparing three aspects: predictive performance, memory usage and scalability of the ball. Among them, the SCDA-III method performs best in all these aspects.
Funder
National Natural Science Foundation of China
Science Research Project of Hebei Education Department of China under Grant
Publisher
Springer Science and Business Media LLC
Reference38 articles.
1. Baidari I, Honnikoll N (2021) Bhattacharyya distance based concept drift detection method for evolving data stream. Expert Syst Appl 183:115303. https://doi.org/10.1016/j.eswa.2021.115303
2. Bifet A, Gavalda R (2007) Learning from time-changing data with adaptive windowing. In: Proceedings of the 2007 SIAM international conference on data mining. SIAM, pp 443–448. https://doi.org/10.1137/1.9781611972771.42
3. Bifet A, Holmes G, Pfahringer B et al (2010) Moa: Massive online analysis, a framework for stream classification and clustering. In: Proceedings of the first workshop on applications of pattern analysis. PMLR, pp 44–50
4. Butt UA, Amin R, Aldabbas H et al (2023) Cloud-based email phishing attack using machine and deep learning algorithm. Complex Intell Syst 9(3):3043–3070. https://doi.org/10.1007/s40747-022-00760-3
5. Fu H, Manogaran G, Wu K et al (2020) Intelligent decision-making of online shopping behavior based on internet of things. Int J Inf Manag 50:515–525. https://doi.org/10.1016/j.ijinfomgt.2019.03.010