Structural k-means (S k-means) and clustering uncertainty evaluation framework (CUEF) for mining climate data
-
Published:2023-04-24
Issue:8
Volume:16
Page:2215-2233
-
ISSN:1991-9603
-
Container-title:Geoscientific Model Development
-
language:en
-
Short-container-title:Geosci. Model Dev.
Author:
Doan Quang-VanORCID, Amagasa Toshiyuki, Pham Thanh-Ha, Sato Takuto, Chen Fei, Kusaka Hiroyuki
Abstract
Abstract. Dramatic increases in climate data underlie a gradual
paradigm shift in knowledge acquisition methods from physically based models
to data-based mining approaches. One of the most popular data clustering/mining techniques is k-means, and it has been used to
detect hidden patterns in climate systems; k-means is established based on distance metrics for
pattern recognition, which is relatively ineffective when dealing with “structured” data, that is,
data in time and space domains, which are dominant in climate science. Here, we propose (i) a novel structural-similarity-recognition-based k-means algorithm called structural k-means or S k-means for
climate data mining and (ii) a new clustering uncertainty representation/evaluation framework based on the information entropy concept. We
demonstrate that the novel S k-means could provide higher-quality clustering
outcomes in terms of general silhouette analysis, although it requires
higher computational resources compared with conventional algorithms. The
results are consistent with different demonstration problem settings using
different types of input data, including two-dimensional weather patterns,
historical climate change in terms of time series, and tropical cyclone
paths. Additionally, by quantifying the uncertainty underlying the
clustering outcomes we, for the first time, evaluated the “meaningfulness”
of applying a given clustering algorithm for a given dataset. We expect that
this study will constitute a new standard of k-means clustering with
“structural” input data, as well as a new framework for uncertainty
representation/evaluation of clustering algorithms for (but not limited to)
climate science.
Funder
University of Tsukuba
Publisher
Copernicus GmbH
Reference52 articles.
1. Arthur, D. and Vassilvitskii, S.: k-means++: the advantages of careful
seeding, in: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on
Discrete Algorithms, SODA 2007, New Orleans, Louisiana, USA, 7–9 January 2007, 1027–1035, https://theory.stanford.edu/~sergei/papers/kMeansPP-soda.pdf (last access: 23 January 2023), 2007. 2. Barua, D. K.: Beaufort Wind Scale, in: Encyclopedia of Coastal Science,
edited by: Finkl, C. W. and Makowski, C., Springer International Publishing,
Cham, 315–317, https://doi.org/10.1007/978-3-319-93806-6_45,
2019. 3. Bradley, P. S. and Fayyad, U. M.: Refining Initial Points for K-Means
Clustering, in: Proc. 15th International Conf. on Machine Learning, Morgan Kaufmann, San Francisco, CA, 91–99, 1998. 4. Camus, P., Menéndez, M., Méndez, F. J., Izaguirre, C., Espejo, A.,
Cánovas, V., Pérez, J., Rueda, A., Losada, I. J., and Medina, R.: A
weather-type statistical downscaling framework for ocean wave climate, J.
Geophys. Res.-Oceans, 119, 7389–7405, https://doi.org/10.1002/2014JC010141,
2014. 5. Chan, E. Y., Ching, W. K., Ng, M. K., and Huang, J. Z.: An optimization
algorithm for clustering using weighted dissimilarity measures,
Pattern Recogn., 37, 943–952, https://doi.org/10.1016/j.patcog.2003.11.003, 2004.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
|
|