ANDClust: An Adaptive Neighborhood Distance‐Based Clustering Algorithm to Cluster Varying Density and/or Neck‐Typed Datasets-Reference-Cited by-同舟云学术

ANDClust: An Adaptive Neighborhood Distance‐Based Clustering Algorithm to Cluster Varying Density and/or Neck‐Typed Datasets

Published:2024-03-08 Issue:4 Volume:7 Page:
ISSN:2513-0390
Container-title:Advanced Theory and Simulations
language:en
Short-container-title:Advcd Theory and Sims

Author:

Şenol Ali¹^ORCID

Affiliation:

1. Computer Engineering Department Faculty of Engineering Tarsus University Mersin 33400 Turkey

Abstract

AbstractAlthough density‐based clustering algorithms can successfully define clusters in arbitrary shapes, they encounter issues if the dataset has varying densities or neck‐typed clusters due to the requirement for precise distance parameters, such as eps parameter of DBSCAN. These approches assume that data density is homogenous, but this is rarely the case in practice. In this study, a new clustering algorithm named ANDClust (Adaptive Neighborhood Distance‐based Clustering Algorithm) is proposed to handle datasets with varying density and/or neck‐typed clusters. The algorithm consists of three parts. The first part uses Multivariate Kernel Density Estimation (MulKDE) to find the dataset's peak points, which are the start points for the Minimum Spanning Tree (MST) to construct clusters in the second part. Lastly, an Adaptive Neighborhood Distance (AND) ratio is used to weigh the distance between the data pairs. This method enables this approach to support inter‐cluster and intra‐cluster density varieties by acting as if the distance parameter differs for each data of the dataset. ANDClust is tested on synthetic and real datasets to reveal its efficiency. The algorithm shows superior clustering quality in a good run‐time compared to its competitors. Moreover, ANDClust could effectively define clusters of arbitrary shapes and process high‐dimensional, imbalanced datasets may have outliers.

Publisher

Wiley

Link

https://onlinelibrary.wiley.com/doi/pdf/10.1002/adts.202301113

Reference52 articles.

1. A New Feature Selection Metric Based on Rough Sets and Information Gain in Text Classification

2. Application of Machine Learning Techniques for Clustering of Rainfall Time Series Over Ganges River Basin