Affiliation:
1. Gannan Normal University
Abstract
Text classification presents difficult challenges due to the high dimensionality and sparsity of text data, and to the complex semantics of the natural language. Typically, in text classification the documents are represented in the vector space using the Bag of words (BoW) technique. Despite its ease of use, BoW representation does not consider the semantic similarity between words. In this paper, we overcome the shortage of the BoW approach by applying the exponential kernel, which models semantic similarity by means of a diffusion process on a graph defined by lexicon and co-occurrence information, to enrich the BoW representation. Combined with the support vector machine (SVM), experimental evaluation on real data sets demonstrates that our approach successfully achieves improved classification accuracy with respect to the BoW approach.
Publisher
Trans Tech Publications, Ltd.
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献