Geometry-Inference based Clustering-Heuristic: An empirical method for kmeans optimal clusters determination

Author:

Khattabi Mohammed Zakariae El1,Jai Mostapha El2,Akhrif Iatimad2,Lahmadi Youssef1,Oughdir Lahcen1

Affiliation:

1. Enginering, System and Applications Laboratory, Ecole Nationale des Sciences Appliquées, Sidi Mohamed Ben Abdellah University

2. Euromed Center of Research, Euromed Polytechnic School, Euromed University of Fes

Abstract

Abstract Kmeans is one of the most algorithms that are utilized in data clustering. Number of metrics is coupled with kmeans in order cluster data targeting the enhancement of both locally clusters compactness and the globally clusters separation. Then, before the ultimate data assignment to their corresponding clusters, the selection of the optimal number of clusters should constitute a crucial step in the clustering process. The present work aims to build up a new clustering metric/heuristic that takes into account both space dispersion and inferential characteristics of the data to be clustered. Hence, in this paper, a Geometry-Inference based Clustering (GIC) heuristic is proposed for selecting the optimal numbers of clusters. The conceptual approach proposes the “Initial speed rate” as the main geometric parameter to be inferentially studied. After, the corresponding histograms are fitted by means of classical distributions. A clear linear behaviour regarding the distributions’ parameters was detected according to the number of optimal clusters k* for each of the 14 datasets adopted in this work. Finally, for each dataset, the optimal k* is observed to match with the change-points assigned as the intersection of two clearly salient lines. All fittings are tested using Khi2 tests showing excellent fitting in terms of p-values, and R² also for linear fittings. Then, a change-point algorithm is launched to select k*. To sum up, the GIC heuristic shows a full quantitative aspect, and is fully automated; no qualitative index or graphical techniques are used herein.

Publisher

Research Square Platform LLC

Reference41 articles.

1. Ahmed M, Choudhury N, Uddin S (2017) Anomaly detection on big data in financial markets. In: 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, pp 998–1001

2. A survey of anomaly detection techniques in financial domain;Ahmed M;Future Gener Comput Syst,2016

3. New urban map of Eurasia using MODIS and multi-source geospatial data;Alsaaideh B;Geo-Spat Inf Sci,2017

4. CPI-model-based analysis of sparse k-means clustering algorithms;Aoyama K;Int J Data Sci Anal,2021

5. Overcoming the Heuristic Nature of k -Means Clustering: Identification and Characterization of Binding Modes from Simulations of Molecular Recognition Complexes;Bremer PL;J Chem Inf Model,2020

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3