Improved Constrained k-Means Algorithm for Clustering with Domain Knowledge-Reference-Cited by-同舟云学术

Improved Constrained k-Means Algorithm for Clustering with Domain Knowledge

Published:2021-09-26 Issue:19 Volume:9 Page:2390
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Huang Peihuang,Yao Pei,Hao Zhendong,Peng Huihong,Guo Longkun

Abstract

Witnessing the tremendous development of machine learning technology, emerging machine learning applications impose challenges of using domain knowledge to improve the accuracy of clustering provided that clustering suffers a compromising accuracy rate despite its advantage of fast procession. In this paper, we model domain knowledge (i.e., background knowledge or side information), respecting some applications as must-link and cannot-link sets, for the sake of collaborating with k-means for better accuracy. We first propose an algorithm for constrained k-means, considering only must-links. The key idea is to consider a set of data points constrained by the must-links as a single data point with a weight equal to the weight sum of the constrained points. Then, for clustering the data points set with cannot-link, we employ minimum-weight matching to assign the data points to the existing clusters. At last, we carried out a numerical simulation to evaluate the proposed algorithms against the UCI datasets, demonstrating that our method outperforms the previous algorithms for constrained k-means as well as the traditional k-means regarding the clustering accuracy rate although with a slightly compromised practical runtime.

Funder

National Natural Science Foundation of China

Natural Science Foundation of Fujian Province

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/9/19/2390/pdf

Reference32 articles.

1. Clustering with instance-level constraints;Wagstaff;AAAI/IAAI,2000

2. AGNES: A Novel Algorithm for Visualising Diversified Graphical Entity Summarisations on Knowledge Graphs

3. The seeding algorithm for k-means problem with penalties

Cited by 5 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring community detection methods and their diverse applications in complex networks: a comprehensive review;Social Network Analysis and Mining;2024-06-12

2. Early Detection of Mental and Behavioral Health Issues from High-School Academic Performance;2023 International Conference on Machine Learning and Applications (ICMLA);2023-12-15

3. Unsupervised Machine Learning Techniques for Improving Reservoir Interpretation Using Walkaway VSP and Sonic Log Data;Energies;2023-01-02

4. 基于轻量级YOLOv4的小目标实时检测;Laser & Optoelectronics Progress;2023

5. SRG: a clustering algorithm based on scale division and region growing;Cluster Computing;2022-12-24