Affiliation:
1. Hubei Normal University
Abstract
Text clustering typically involves clustering in a high dimensional space, which appears difficult with regard to virtually all practical settings. In addition, given a particular clustering result it is typically very hard to come up with a good explanation of why the text clusters have been constructed the way they are. . To solve these problems, based on topic concept clustering, this paper proposes a method for Chinese document clustering. In this paper, we introduce a novel topical document clustering method called Document Features Indexing Clustering (DFIC), which can identify topics accurately and cluster documents according to these topics. In DFIC, “topic elements” are defined and extracted for indexing base clusters. Additionally, document features are investigated and exploited. Experimental results show that DFIC can gain a higher precision (92.76%) than some widely used traditional clustering methods.
Publisher
Trans Tech Publications, Ltd.
Reference9 articles.
1. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman, Indexing by latent semantic analysis [J], Journal of the Society for Information Science, 2002, 41(6), 391-407.
2. Lee D-L, Chuang H and Seamons K. Document Ranking and the Vector-Space Model [J]. IEEE Software, 20097, Vol. 14 (2): 67-75.
3. Daniel Fasulo. An analysis of recent work on clustering algorithms [M]. Technical Report UW-CSE-01-03-02, University of Washington, (2004).
4. Zamir O and Etzioni O. Web Document Clustering: A Feasibility Demonstration [A]. In Proceedings of the 21st International ACM SIGIR Conference on Research and Development in Information Retrieval [C]. 2008. pp.46-54.
5. Gusfield D. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology [M]. Cambridge, UK: Cambridge University Press, (2007).
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献