Affiliation:
1. School of Management, Henan Institute of Economics and Trade, Zhengzhou 450046, China
Abstract
With the rapid development of network technology and database technology, computers have been able to store large-scale and massive data. On the other hand, traditional data analysis and processing tools such as management information system can only process these data on the surface, but the deeper data analysis ability is not satisfactory. The contradiction between data supply ability and data analysis ability is becoming more and more prominent, so there is an urgent need for an automation technology that can deeply process data. Data mining technology came into being. Cluster analysis, as an important topic in data mining, is a data mining method that divides data into natural groups and gives the description of the characteristics of each group. It is a basic method of data mining and knowledge discovery. Cluster analysis is a data mining technology for unsupervised classification of data without prior knowledge and guidance. Through the appropriate use of advanced algorithms, it can explore the hidden valuable information, improve the quality of data analysis and interpretation, and provide a scientific judgment basis for the reprocessing or understanding of data by other data analysis and sorting tools. First, this paper briefly introduces the principle, development, and methods of cluster analysis and expounds the application of cluster analysis. Then it expounds the principle of R-means clustering algorithm, analyzes the advantages and disadvantages of basic R-means clustering algorithm, and expounds several existing improvement methods. An improved R-means clustering algorithm and a clustering analysis model based on R-means clustering algorithm are proposed, and the corresponding algorithm flow and implementation are given.