Affiliation:
1. Huaiyin Institute of Technology
Abstract
To solve the problem of sparse keywords and similarity drift in short text segments, this paper proposes short text clustering algorithm with feature keyword expansion (STCAFKE). The method can realize short text clustering by expanding feature keyword based on HowNet and combining K-means algorithm and density algorithm. It may add the number of text keyword with feature keyword expansion and increase text semantic features to realize short text clustering. Experimental results show that this algorithm has increased the short text clustering quality on precision and recall.
Publisher
Trans Tech Publications, Ltd.
Reference7 articles.
1. Carullo M, Binaghi E, Gallo I. An online document clustering technique for short web contents. Pattern Recognit Lett , 2009, 30(10), p.870–876.
2. Pinto D, Bened JM, Rosso P. Clustering narrow-domain short texts by using the Kullback-Leibler distance. In: Gelbukh A. (ed. ) CICLing 2007, LNCS, vol. 4394, p.611–622.
3. Liu Qun , Li SuJian. Word similarity computing based on HowNet. Computational Linguistics and Chinese Language Processing , 2002, 7 (2), pp.59-76.
4. Lin Li. Text clustering reseach based on semantic distance. Xiamen University Master thesis, 2007(4).
5. Wan Xiaojun. A novel document similarity measure based on earth mover's distance. Information Science, 2007, pp.3718-3730.
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献