AN EFFECTIVE FUZZY CLUSTERING ALGORITHM FOR WEB DOCUMENT CLASSIFICATION: A CASE STUDY IN CULTURAL CONTENT MINING-Reference-Cited by-同舟云学术

AN EFFECTIVE FUZZY CLUSTERING ALGORITHM FOR WEB DOCUMENT CLASSIFICATION: A CASE STUDY IN CULTURAL CONTENT MINING

Published:2013-08 Issue:06 Volume:23 Page:869-886
ISSN:0218-1940
Container-title:International Journal of Software Engineering and Knowledge Engineering
language:en
Short-container-title:Int. J. Soft. Eng. Knowl. Eng.

Author:

TSEKOURAS GEORGE E.¹,GAVALAS DAMIANOS¹

Affiliation:

1. Department of Cultural Technology & Communication, University of the Aegean, Mytilene, Lesvos Island, Greece

Abstract

This article presents a novel crawling and clustering method for extracting and processing cultural data from the web in a fully automated fashion. Our architecture relies upon a focused web crawler to download web documents relevant to culture. The focused crawler is a web crawler that searches and processes only those web pages that are relevant to a particular topic. After downloading the pages, we extract from each document a number of words for each thematic cultural area, filtering the documents with non-cultural content; we then create multidimensional document vectors comprising the most frequent cultural term occurrences. We calculate the dissimilarity between the cultural-related document vectors and for each cultural theme, we use cluster analysis to partition the documents into a number of clusters. Our approach is validated via a proof-of-concept application which analyzes hundreds of web pages spanning different cultural thematic areas.

Publisher

World Scientific Pub Co Pte Lt

Subject

Artificial Intelligence,Computer Graphics and Computer-Aided Design,Computer Networks and Communications,Software

Link

https://www.worldscientific.com/doi/pdf/10.1142/S021819401350023X

Reference18 articles.

1. Classifying Web pages employing a probabilistic neural network

2. Using a new relational concept to improve the clustering performance of search engines

3. Web page classification based on a support vector machine using a weighted vote schema

4. Fuzzy Model Identification Based on Cluster Estimation

5. Clustering Web pages based on their structure

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Obtaining Fuzzy Membership Function of Clusters With the Memristor Hardware Implementation and On-Chip Learning;IEEE Transactions on Emerging Topics in Computational Intelligence;2022-08

3. Web Pages Classification with Parliamentary Optimization Algorithm;International Journal of Software Engineering and Knowledge Engineering;2017-04

4. Fuzzy Clustering Algorithms — Review of the Applications;2016 IEEE International Conference on Smart Cloud (SmartCloud);2016-11