Revealing Groups of Semantically Close Textual Documents by Clustering-Reference-Cited by-同舟云学术

Revealing Groups of Semantically Close Textual Documents by Clustering

Published: Issue: Volume: Page:71-111
ISSN:2372-109X
Container-title:Advances in Linguistics and Communication Studies
language:
Short-container-title:

Author:

Dařena František¹,Žižka Jan¹

Affiliation:

1. Mendel University in Brno, Czech Republic

Abstract

The chapter introduces clustering as a family of algorithms that can be successfully used to organize text documents into groups without prior knowledge of these groups. The chapter also demonstrates using unsupervised clustering to group large amount of unlabeled textual data (customer reviews written informally in five natural languages) so it can be used later for further analysis. The attention is paid to the process of selecting clustering algorithms, their parameters, methods of data preprocessing, and to the methods of evaluating the results by a human expert with an assistance of computers, too. The feasibility has been demonstrated by a number of experiments with external evaluation using known labels and expert validation with an assistance of a computer. It has been found that it is possible to apply the same procedures, including clustering, cluster validation, and detection of topics and significant words for different natural languages with satisfactory results.

Publisher

IGI Global

Reference68 articles.

1. A Survey of Text Clustering Algorithms

2. Entropy, a Protean concept.;R.Balian;Séminaire Poincaré,2013

3. Text Mining

4. Bsoul, Q., Salim, J., & Zakaria, L. Q. (2013). An Intelligent Document Clustering Approach to Detect Crime Patterns. Procedia Technology (4th International Conference on Electrical Engineering and Informatics, ICEEI 2013), 11, 1181–1187.

5. Cao, Y., Zhang, P., Guo, J., & Guo, L. (2014). Mining Large-scale Event Knowledge from Web Text. Procedia Computer Science (2014 International Conference on Computational Science), 29, 478–487.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Text Mining;Advances in Data Mining and Database Management;2017