Affiliation:
1. National University of Defense Technology
Abstract
K-means algorithm is common in text clustering algorithm. The traditional K-means algorithm has sensitivity to the initial centers. The result of clustering depends on the initial centers excessively. For different input, the output fluctuated considerably. The K-means algorithm combined features dictionary with density based on outlier detection to detect the outliers in text data. In the first stage, the density parameter is given to all of the data objects using the custom distance function. In the second stage, K-means is used to cluster base on the distribution of density. K data objects are chosen to be the initial clustering centers as they belong to high density area and have the farthest distance for each other. In the third stage, the exception text sets can be identified from the clustering by the outlier detection algorithm. Experimental results show that the proposed approach can efficiently detect outliers in data set.
Publisher
Trans Tech Publications, Ltd.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. An Improved Cuckoo Search Based K-Means for Outlier Detection;2023 IEEE International Conference on Electrical, Automation and Computer Engineering (ICEACE);2023-12-29