Affiliation:
1. The School of Computer Science and Engineering, Northwest Normal University, Lanzhou 730070, China
Abstract
The find of density peak clustering algorithm (FDP) has poor performance on high-dimensional data. This problem occurs because the clustering algorithm ignores the feature selection. All features are evaluated and calculated under the same weight, without distinguishing. This will lead to the final clustering effect which cannot achieve the expected. Aiming at this problem, we propose a new method to solve it. We calculate the importance value of all features of high-dimensional data and calculate the mean value by constructing random forest. The features whose importance value is less than 10% of the mean value are removed. At this time, we extract the important features to form a new dataset. At this time, improved t-SNE is used for dimension reduction, and better performance will be obtained. This method uses t-SNE that is improved by the idea of random forest to reduce the dimension of the original data and combines with improved FDP to compose the new clustering method. Through experiments, we find that the evaluation index NMI of the improved algorithm proposed in this paper is 23% higher than that of the original FDP algorithm, and 9.1% higher than that of other clustering algorithms (
-means, DBSCAN, and spectral clustering). It has good performance in high-dimensional datasets that are verified by experiments on UCI datasets and wireless sensor networks.
Funder
Northwest Normal University
Subject
Electrical and Electronic Engineering,Computer Networks and Communications,Information Systems
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献