Abstract
AbstractDensity peak clustering (DPC) algorithm is to find clustering centers by calculating the local density and distance of data points based on the distance between data points and the cutoff distance (dc) set manually. Generally, the attribute calculation between data points is simply obtained by Euclidean distance. However, when the density distribution of data points in data sets is uneven, there are high-density and low-density points, and the dc value is set artificially and randomly, this will seriously affect the clustering results of DPC algorithm. For this reason, a clustering algorithm which combines teaching and learning optimization algorithm and density gap is proposed (NSTLBO-DGDPC). First, in order to consider the influence of data point attributes and neighborhoods, the density difference distance is introduced to replace the Euclidean distance of the original algorithm. Secondly, because manual selection of clustering centers may produce incorrect clustering results, the standard deviation of high-density distance is used to determine the clustering centers of clustering algorithm. Finally, using the teaching and learning optimization algorithm (TLBO) to find the optimal value, in order to avoid the algorithm falling into local optimum. When the population density reaches a certain threshold, the niche selection strategy is introduced to discharge the similarity value, and then the nonlinear decreasing strategy is used to update the students in the teaching stage and the learning stage to obtain the optimal dc solution. In this paper, the accuracy and convergence of the improved TLBO algorithm (NSTLBO) are verified by ten benchmark functions. Simulation experiments show that the NSTLBO algorithm has better performance. Clustering algorithm integrating teaching and learning optimization algorithm and density gap proposed in this paper are validated by using eight synthetic data sets and eight real data sets. The simulation results show that the algorithm has better clustering quality and effect.
Funder
National Outstanding Youth Science Fund Project of National Natural Science Foundation of China
Special Foundation of Scientific and Technological Innovation for Young Scientists of Harbin, China
Publisher
Springer Science and Business Media LLC
Reference37 articles.
1. Bousbaci A, Kamel N. Efficient data distribution and results merging for parallel data clustering in mapreduce environment. Appl Intell. 2018;48(8):2408–28.
2. Qian WN, Zhou AY. Analyzing popular clustering algorithms from different viewpoints. J Soft. 2002;13(8):1382–94.
3. Xu JH, Liu H. Wed users clustering analusis based on k-means algorithm. In: Information Networking and Automation (ICINA). IEEE; 2010. p. v26–29.
4. Guha S, Rastogi R, Shim K. ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of the IEEE Conference on Data Engineering. 1999.
5. Ester M, Kriegel HP, Xu X. A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of international conference on knowledge discovery and data mining. AAAI Press; 1996. p. 226–31.
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献