Affiliation:
1. Institute of Business Administration Karachi
Abstract
Abstract
In streaming data environments, data characteristics and probability distributions are likely to change over time, causing a phenomenon called concept drift, which poses challenges for machine learning models to predict accurately. In such non-stationary environments, there is a need to detect concept drift and update the model to maintain an acceptable predictive performance. Existing approaches to drift detection have inherent problems like requirements of truth labels in supervised detection methods and high false positive rate in case of unsupervised drift detection. In this paper, we propose a semi-supervised Autoencoder based Drift Detection Method (AEDDM) aimed at detecting drift without the need of truth labels, yet with a high confidence that the detected drift is real. In a binary classification setting, AEDDM uses two autoencoders in a layered architecture, trained on labelled data and uses a thresholding mechanism based on reconstruction error to signal the presence of drift. The proposed method has been evaluated on four synthetic and four real world datasets with different drifting scenarios. In case of real-world datasets, the induced and detected drifts have been evaluated from classifier’s performance viewpoint using seven mostly used batch classifiers as well as from adaptation perspective in an online learning environment using Hoeffding Tree classifier. The results show that AEDDM affectively detects the distributional changes in data which are most likely to impact the classifier’s performance (real drift) while ignoring the virtual drift thus considerably reducing the false alarms with an ability to adapt in terms of classification performance.
Publisher
Research Square Platform LLC
Reference68 articles.
1. Baena-García, M., del Campo-Ávila, J., Fidalgo, R., Bifet, A., Gavaldà, R., & Morales-Bueno, R. (2006). Early Drift Detection Method. 4th ECML PKDD International Workshop on Knowledge Discovery from Data Streams, 6, 77–86. https://doi.org/10.1.1.61.6101
2. RDDM: Reactive drift detection method;Barros RSM;Expert Systems with Applications,2017
3. Bifet, A., & Gavaldà, R. (2007). Learning from time-changing data with adaptive windowing. Proceedings of the 7th SIAM International Conference on Data Mining, 443–448. https://doi.org/10.1137/1.9781611972771.42
4. Brzeziński, D., & Stefanowski, J. (2011). Accuracy updated ensemble for data streams with concept drift. International Conference on Hybrid Intelligent Systems, 6679 LNAI(PART 2), 155–163. https://doi.org/10.1007/978-3-642-21222-2_19
5. Concept drift detection based on Fisher’s Exact test;Cabral DR;Information Sciences,2018