Improved K-Means Algorithm Based on Outliers Detection in Review Spam Filtering-Reference-Cited by-同舟云学术

Improved K-Means Algorithm Based on Outliers Detection in Review Spam Filtering

Published:2014-08 Issue: Volume:602-605 Page:2233-2237
ISSN:1662-7482
Container-title:Applied Mechanics and Materials
language:
Short-container-title:AMM

Author:

Ding Zhe Yuan¹,He Ming Ke¹,Gao Ming Ze¹,Li Fang Fang¹

Affiliation:

1. National University of Defense Technology

Abstract

K-means algorithm is common in text clustering algorithm. The traditional K-means algorithm has sensitivity to the initial centers. The result of clustering depends on the initial centers excessively. For different input, the output fluctuated considerably. The K-means algorithm combined features dictionary with density based on outlier detection to detect the outliers in text data. In the first stage, the density parameter is given to all of the data objects using the custom distance function. In the second stage, K-means is used to cluster base on the distribution of density. K data objects are chosen to be the initial clustering centers as they belong to high density area and have the farthest distance for each other. In the third stage, the exception text sets can be identified from the clustering by the outlier detection algorithm. Experimental results show that the proposed approach can efficiently detect outliers in data set.

Publisher

Trans Tech Publications, Ltd.

Link

https://www.scientific.net/AMM.602-605.2233.pdf

Reference10 articles.

1. Jindal N., Liu B. Review Spam Detection[C]/Proc 16th WWW Conference. Banff, 2007. 1189-1190.

2. Jindal N., Liu B. Analyzing and Detecting Review Spam[C]/Proc 7th IEEE International Conference on Data Mining. Omaha, 2007. 547-552.

3. Jindal N., Liu B. Opinion Spam and Analysis[C]/Proc International Conference on WSDM. Palo Alto, 2008. 219-229.

4. Yu-Ru Lin et al., Splog Dection Using Self-similarity Analysis on Blog Temporal Dynamics. Banff, Albertu, Canada in Proceeding of AIRWeb 2007, May 8, (2007).

5. Archana Bhattarai, Vasile Rus, and Dipankar Dasgupta. Characterizing Comment Spam in the Blogosphere through Content Analysis. IEEE Xplore, (2009).

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An Improved Cuckoo Search Based K-Means for Outlier Detection;2023 IEEE International Conference on Electrical, Automation and Computer Engineering (ICEACE);2023-12-29