Affiliation:
1. Postdoctoral Workstation, Party School of Anhui Provincial Committee of the Communist Party of China (Anhui Academy of Governance), Hefei 230022, Anhui, China
2. School of Management, Hefei University of Technology, Hefei 230009, Anhui, China
Abstract
The main objective of this paper is to present a new clustering algorithm for metadata trees based on K-prototypes algorithm, GSO (glowworm swarm optimization) algorithm, and maximal frequent path (MFP). Metadata tree clustering includes computing the feature vector of the metadata tree and the feature vector clustering. Therefore, traditional data clustering methods are not suitable directly for metadata trees. As the main method to calculate eigenvectors, the MFP method also faces the difficulties of high computational complexity and loss of key information. Generally, the K-prototypes algorithm is suitable for clustering of mixed-attribute data such as feature vectors, but the K-prototypes algorithm is sensitive to the initial clustering center. Compared with other swarm intelligence algorithms, the GSO algorithm has more efficient global search advantages, which are suitable for solving multimodal problems and also useful to optimize the K-prototypes algorithm. To address the clustering of metadata tree structures in terms of clustering accuracy and high data dimension, this paper combines the GSO algorithm, K-prototypes algorithm, and MFP together to study and design a new metadata structure clustering method. Firstly, MFP is used to describe metadata tree features, and the key parameter of categorical data is introduced into the feature vector of MFP to improve the accuracy of the feature vector to describe the metadata tree; secondly, GSO is combined with K-prototypes to design GSOKP for clustering the feature vector that contains numeric data and categorical data so as to improve the clustering accuracy; finally, tests are conducted with a set of metadata trees. The experimental results show that the designed metadata tree clustering method GSOKP-FP has certain advantages in respect to clustering accuracy and time complexity.
Funder
National Social Science Foundation of China
Subject
General Engineering,General Mathematics
Reference19 articles.
1. A review on metadata management in large-scale distributed file systems;Y. Wang;Journal of Integration Technology,2016
2. Hierarchical document clustering using frequent item sets;B. Fung
3. Metadata clustering method based on maximal frequent path;X. Z. Feng;Computer Engineering,2010
4. Clustering large data sets with mixed numeric and categorical values;Z. X. Huang
5. A novel density peaks clustering algorithm for mixed data
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献