Affiliation:
1. Ming-Chuan University, Taipei, Taiwan
Abstract
Most of the researches on text categorization are focus on using bag of words. Some researches provided other methods for classification such as term phrase, Latent Semantic Indexing, and term clustering. Term clustering is an effective way for classification, and had been proved as a good method for decreasing the dimensions in term vectors. The authors used hierarchical term clustering and aggregating similar terms. In order to enhance the performance, they present a modify indexing with terms in cluster. Their test collection extracted from Chinese NETNEWS, and used the Centroid-Based classifier to deal with the problems of categorization. The results had shown that term clustering is not only reducing the dimensions but also outperform than bag of words. Thus, term clustering can be applied to text classification by using any large corpus, its objective is to save times and increase the efficiency and effectiveness. In addition to performance, these clusters can be considered as conceptual knowledge base, and kept related terms of real world.
Subject
Information Systems and Management,Computer Science Applications,Management Information Systems
Reference32 articles.
1. Automated learning of decision rules for text categorization
2. Distributional clustering of words for text classification
3. Hierarchical Word Clustering for Relevance Judgments in Information Retrieval.;N.Bassiou;Proceedings of the 1st International Workshop on Pattern Recognition in Information Systems,2001
4. Distributional word clusters vs. words for text categorization.;R.Bekkerman;Journal of Machine Learning Research,2003
5. On feature distributional clustering for text categorization
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献