Abstract
AbstractIn modern data analysis, time is often considered just another feature. Yet time has a special role that is regularly overlooked. Procedures are usually only designed for time-independent data and are therefore often unsuitable for the temporal aspect of the data. This is especially the case for clustering algorithms. Although there are a few evolutionary approaches for time-dependent data, the evaluation of these and therefore the selection is difficult for the user. In this paper, we present a general evaluation measure that examines clusterings with respect to their temporal stability and thus provides information about the achieved quality. For this purpose, we examine the temporal stability of time series with respect to their cluster neighbors, the temporal stability of clusters with respect to their composition, and finally conclude on the temporal stability of the entire clustering. We summarise these components in a parameter-free toolkit that we call Cluster Over-Time Stability Evaluation (CLOSE). In addition to that we present a fuzzy variant which we call FCSETS (Fuzzy Clustering Stability Evaluation of Time Series). These toolkits enable a number of advanced applications. One of these is parameter selection for any type of clustering algorithm. We demonstrate parameter selection as an example and evaluate results of classical clustering algorithms against a well-known evolutionary clustering algorithm. We then introduce a method for outlier detection in time series data based on CLOSE. We demonstrate the practicality of our approaches on three real world data sets and one generated data set.
Funder
Heinrich-Heine-Universität Düsseldorf
Publisher
Springer Science and Business Media LLC
Reference59 articles.
1. Ahmar AS, Guritno S, Abdurakhman RA, Awi A, Minggi I, Tiro MA, Aidid MK, Annas S, Sutiksno DU, Ahmar DS, Ahmar KH, Ahmar A, Zaki A, Abdullah D, Rahim R, Nurdiyanto H, Hidayat R, Napitupulu D, Simarmata J, Kurniasih N, Abdillah LA, Pranolo A, Haviluddin AW, Arifin ANM (2018) Modeling data containing outliers using ARIMA additive outlier (ARIMA-AO). J Phys: Conf Ser,:954. https://doi.org/10.1088/1742-6596/954/1/012010
2. Alaee S, Mercer R, Kamgar K, Keogh E (2021) Time series motifs discovery under dtw allows more robust discovery of conserved structure. Data Min Knowl Disc:1–48. https://doi.org/10.1007/s10618-021-00740-0
3. Banerjee A, Ghosh J (2001) Clickstream clustering using weighted longest common subsequences. In: Proceedings of the web mining workshop at the 1st SIAM conference on data mining, pp 33–40
4. Ben-David S, Von Luxburg U (2008) Relating clustering stability to properties of cluster boundaries. In: 21St annual conference on learning theory (COLT 2008), pp 379–390
5. Beringer J, Hüllermeier E (2007) Adaptive optimization of the number of clusters in fuzzy clustering. In: Proceedings of the IEEE international conference on fuzzy systems, pp 1–6. https://doi.org/10.1109/FUZZY.2007.4295444
Cited by
7 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献