Affiliation:
1. Departamento de Ciencias de la Computación e Inteligencia Artificial, CITIC-UGR Universidad de Granada, 18071 Granada, Spain
Abstract
This paper proposes a new method of dimensionality reduction when performing Text Classification, by applying the discrete wavelet transform to the document-term frequencies matrix. We analyse the features provided by the wavelet coefficients from the different orientations: (1) The high energy coefficients in the horizontal orientation correspond to relevant terms in a single document. (2) The high energy coefficients in the vertical orientation correspond to relevant terms for a single document, but not for the others. (3) The high energy coefficients in the diagonal orientation correspond to relevant terms in a document in comparison to other terms. If we filter using the wavelet coefficients and fulfil these three conditions simultaneously, we can obtain a reduced vocabulary of the corpus, with less dimensions than in the original one. To test the success of the reduced vocabulary, we recoded the corpus with the new reduced vocabulary and we obtained a statistically relevant level of accuracy for document classification.
Publisher
World Scientific Pub Co Pte Lt
Subject
Library and Information Sciences,Computer Networks and Communications,Computer Science Applications
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献