Author:
Vera J. Fernando,Angulo José M.
Abstract
AbstractPartitioning algorithms, and in particular K-means clustering, are widely used in time series analysis. K-means clustering is intrinsically related to the use of the Euclidean distance as a measure of dissimilarity. When other dissimilarity measures, such as dynamic time warping, are involved, K-means clustering is usually replaced by the optimisation of a sums-of-the-stars clustering criterion, giving rise to an algorithm other than that of K-means, such as K-medoids. Another common restriction in the implementation of K-means concerns the need to estimate the average as the cluster prototype, which may represent a drawback for this method in time series when elastic measures such as dynamic time warping are used. In this paper, we propose a multidimensional scaling based K-means clustering algorithm that enables the use of K-means clustering together with any dissimilarity measure, and in particular with dynamic time warping, without requiring us to estimate cluster prototypes for the time series. This procedure is a true K-means clustering algorithm that searches for the partition in an equivalent auxiliary configuration, usually in a dimension lower than the time series length. The approach proposed is of particular interest when dynamic time warping is used in the analysis of series of unequal length and/or when some data are missing, and hence Euclidean distances cannot be used. The performance of our procedure is tested by conducting an extensive Monte Carlo experiment, comparing the results with those obtained by K-medoids. The procedure is also illustrated with the analysis of carbon dioxide emissions from 133 countries.
Publisher
Springer Science and Business Media LLC
Subject
General Environmental Science,Safety, Risk, Reliability and Quality,Water Science and Technology,Environmental Chemistry,Environmental Engineering
Reference39 articles.
1. Aghabozorgi S, Shirkhorshidi A, Wah T (2015) Time-series clustering—a decade review. Inf Syst 53:16–38
2. Anderson TW (1958) An introduction to multivariate statistical analysis. Wiley, New York
3. Caiado J, Crato N, Peña D (2009) Comparison of times series with unequal length in the frequency domain. Commun Stat Simul Comput 38:527–540
4. Calinski RB, Harabasz J (1974) A dendrite method for cluster analysis. Commun Stat Theory Methods 3:1–27
5. Chen Y, Keogh BH, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive. http://www.timeseriesclassification.com/index.php
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献