Affiliation:
1. School of Cyber Science and Engineering, Sichuan University, Chengdu, China
Abstract
DBSCAN (density-based spatial clustering of applications with noise) is one of the most widely used density-based clustering algorithms, which can find arbitrary shapes of clusters, determine the number of clusters, and identify noise samples automatically. However, the performance of DBSCAN is significantly limited as it is quite sensitive to the parameters of eps and MinPts. Eps represents the eps-neighborhood and MinPts stands for a minimum number of points. Additionally, a dataset with large variations in densities will probably trap the DBSCAN because its parameters are fixed. In order to overcome these limitations, we propose a new density-clustering algorithm called GNN-DBSCAN which uses an adaptive Grid to divide the dataset and defines local core samples by using the Nearest Neighbor. With the help of grid, the dataset space will be divided into a finite number of cells. After that, the nearest neighbor lying in every filled cell and adjacent filled cells are defined as the local core samples. Then, GNN-DBSCAN obtains global core samples by enhancing and screening local core samples. In this way, our algorithm can identify higher-quality core samples than DBSCAN. Lastly, give these global core samples and use dynamic radius based on k-nearest neighbors to cluster the datasets. Dynamic radius can overcome the problems of DBSCAN caused by its fixed parameter eps. Therefore, our method can perform better on dataset with large variations in densities. Experiments on synthetic and real-world datasets were conducted. The results indicate that the average Adjusted Rand Index (ARI), Normalized Mutual Information (NMI), Adjusted Mutual Information (AMI) and V-measure of our proposed algorithm outperform the existing algorithm DBSCAN, DPC, ADBSCAN, and HDBSCAN.
Subject
Artificial Intelligence,General Engineering,Statistics and Probability
Reference34 articles.
1. Data clustering: a review;Jain;ACM Computing Surveys (CSUR),1999
2. STCCD: Semantic trajectory clustering based on community detection in networks;Liu;Expert Systems with Applications,2020
3. Multiple Strong and Balanced Clusters based Ensemble of Deep Learners;Jan;Pattern Recognition,2020
4. The use and reporting of cluster analysis in health psychology: A review;Clatworthy;British Journal of Health Psychology,2005
5. An isolated virtual cluster for SCADA network security research;Lemay;1st International Symposium for ICS & SCADA Cyber Security Research 2013 (ICS-CSR 2013),2013
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献