Viscous Gravity Algorithm for Clustering Multidimensional Data

Author:

Golovinsky Pavel1,Tarasova Anna1

Affiliation:

1. Voronezh State Technical University

Abstract

Clustering is one of the first standard steps for big data analysis. It is necessary for further solving problems of classification and group forecasting. We study a viscous modification of the gravitational data clustering algorithm (VGSA), which develop already proven approach. Individual data records are considered in VGSA as points in multidimensional space, between which a paired central attraction acts. The masses of the interacting points are assumed to be the same, which corresponds to the specifics of clustering, in contrast to the problem of finding the optimal value of the objective function, in which the masses of particles increase as they approach the extremum. The choice of the type of pair interaction depending on the proposed data structure is discussed. The presence of high viscosity lowers the order of the dynamic equations of motion by excluding acceleration from them. The obtained shortened equations define the stable motion of the system, which guarantees the reproduction of the results when the algorithm is restarted. The stability of the system of equations is proved using the Lyapunov function, which is an analogue of the physical potential energy. Turning off the interaction of particles at small distances between them provides an automatic mechanism for hierarchical clustering at different stages of the algorithm with the final formation of a single cluster. The relationship between VGSA and the operating principle of Kohonen's self-organizing maps, which corresponds to the gravitational redistribution of test particles, is traced. The performance of the algorithm has been tested on the database in comparison with the methods of K-means clustering, Kohonen maps and the standard gravity algorithm. The speed and accuracy of clustering were evaluated. The conclusion is made about the advantage of applying VGSA to big data, taking into account the automatic determination of the number of clusters, the possibility of correction when updating records, and inaccurate data specification.

Publisher

Baikal State University

Reference18 articles.

1. Suárez J.L., García S., Herrera F. A Tutorial on Distance Metric Learning: Mathematical Foundations, Algorithms, Experimental Analysis, Prospects and Challenges. Neurocomputing, 2021, vol. 425, pp. 300–322. DOI: 10.1016/j.neucom.2020.08.017.

2. Geron A. Hands-On Machine Learning with Scikit-Learn and TensorFlow. O`Reilly Media, 2017. 574 p. (Russ. ed.: Geron A. Hands-On Machine Learning with Scikit-Learn and TensorFlow. Saint Petersburg, Dialektika Publ., 2020. 690 p.).

3. Dawani J. Hands-On Mathematics for Deep Learning: Build a Solid Mathematical Foundation for Training Efficient Deep Neural Networks. Birmingham, Packt Publishing, 2020. 364 p.

4. Ezugwu A.E., Ikotun A.M., Oyelade O.O., Abualigah L., Agushaka J.O., Eke Ch.I., Akinyelu A.A., A Comprehensive Survey of Clustering Algorithms: Stateof-the-art Machine Learning Applications, Taxonomy, Challenges, and Future Research Prospects. Engineering Applications of Artificial Intelligence, 2022, vol. 110, pp. 104743. DOI: 10.1016/j.engappai.2022.104743.

5. Aggarwal C.C., Reddy Ch.K. (eds). Data Clustering. Algorithms and Applications. New York, CRC Press, 2014. 652 p.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3