Topic Models Ensembles for AD-HOC Information Retrieval-Reference-Cited by-同舟云学术

Topic Models Ensembles for AD-HOC Information Retrieval

Published:2021-09-01 Issue:9 Volume:12 Page:360
ISSN:2078-2489
Container-title:Information
language:en
Short-container-title:Information

Author:

Ormeño Pablo,Mendoza Marcelo^ORCID,Valle Carlos

Abstract

Ad hoc information retrieval (ad hoc IR) is a challenging task consisting of ranking text documents for bag-of-words (BOW) queries. Classic approaches based on query and document text vectors use term-weighting functions to rank the documents. Some of these methods’ limitations consist of their inability to work with polysemic concepts. In addition, these methods introduce fake orthogonalities between semantically related words. To address these limitations, model-based IR approaches based on topics have been explored. Specifically, topic models based on Latent Dirichlet Allocation (LDA) allow building representations of text documents in the latent space of topics, the better modeling of polysemy and avoiding the generation of orthogonal representations between related terms. We extend LDA-based IR strategies using different ensemble strategies. Model selection obeys the ensemble learning paradigm, for which we test two successful approaches widely used in supervised learning. We study Boosting and Bagging techniques for topic models, using each model as a weak IR expert. Then, we merge the ranking lists obtained from each model using a simple but effective top-k list fusion approach. We show that our proposal strengthens the results in precision and recall, outperforming classic IR models and strong baselines based on topic models.

Funder

National Agency of Research and Development

Publisher

MDPI AG

Subject

Information Systems

Link

https://www.mdpi.com/2078-2489/12/9/360/pdf

Reference46 articles.

1. Information Retrieval and Processing;Doyle,1975

2. Reducing hardware hit by queries in web search engines

3. Graph regularization methods for Web spam detection

4. Extracting semantic knowledge from web context for multimedia IR: a taxonomy, survey and challenges

5. Mining user interest based on personality-aware hybrid filtering in social networks

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Leveraging Generative AI in Short Document Indexing;Electronics;2024-09-08

2. AI in Academic Libraries;Advances in Library and Information Science;2024-06-28

3. PerAnSel: A Novel Deep Neural Network-Based System for Persian Question Answering;Computational Intelligence and Neuroscience;2022-07-18