Affiliation:
1. Instituto Nacional de Astrofísica, Óptica y Electrónica, Puebla, México
2. University of Informatic Sciences, Havana, Cuba
Abstract
Humans tend to organize information in documents in a logical and intentional way. This organization, which we call textual structure, is commonly in terms of sections, chapters, paragraphs, or sentences. This structure facilitates the understanding of the content that we want to transmit to the readers. However, such structure, in which we usually encode the semantic content of information, is not usually exploited by the filtering methods for the construction of a user profile. In this work, we propose the use of term relations considering different context levels for enhancing document filtering. We propose methods for obtaining the representation, considering the existence of imbalance between the documents that satisfy the information needs of users, as well as the Cold Start problem (having scarce information) during the initial construction of the user profile. The experiments carried out allowed to assess the impact, in terms of T11SU measure, on the filtering task of the proposed representation.
Subject
Artificial Intelligence,Computer Vision and Pattern Recognition,Theoretical Computer Science
Reference19 articles.
1. M. Albakour, C. Macdonald, I. Ounis et al., On sparsity and drift for effective real-time filtering in microblogs, in: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, ACM, 2013, pp. 419–428.
2. T. Ault and Y. Yang, Knn, rocchio and metrics for information filtering at trec-10, in: Proceedings of the Tenth Text REtrieval Conference, National Institute of Standards and Technology, 2001, pp. 84–93.
3. G. Berardi, D. Ceccarelli, A. Esuli and D. Marcheggiani, On the impact of entity linking in microblog real-time filtering, in: Proceedings of the 30th Annual ACM Symposium on Applied Computing, ACM, 2015, pp. 1066–1071.
4. Latent dirichlet allocation;Blei;Journal of Machine Learning Research,2003
5. M. Carrillo, C. Eliasmith and A. López-López, Combining text vector representations for information retrieval, in: V. Matoušek and P. Mautner, eds, Text, Speech and Dialogue. TSD 2009. LNCS 5729, Berlin, Heidelberg, 2009, pp. 24–31. Springer Berlin Heidelberg.