Affiliation:
1. Information Retrieval Group, Department of Computer Applications, National Institute of Technology, Tamil Nadu, India
Abstract
Query expansion is an important task in information retrieval applications that improves the user query and helps in retrieving the relevant documents. In this paper, N gram Thesaurus is constructed from the documents for query expansion. The HTML TAGs in web documents are considered and their syntactical context is understood. Based on the nature, properties and significances, the TAGs are assigned a suitable weight. Later, the term weight is calculated using corresponding TAG weight and term frequency and later updated into the inverted index. All the single terms in the inverted index are updated as Unigrams in the Thesaurus. Further, Bigrams are constructed using Unigrams. Likewise, the rest of the (N + 1) grams are generated using N grams and their weights and later updated into the Thesaurus. During the query session, the user query terms are expanded based on the predicted N grams provided by the Thesaurus that are given as suggestions to the user. The performance of the proposed approach is evaluated using the Clueweb09B, WT10g and GOV2 benchmark dataset. The improvement gain against baseline is considered as an evaluation parameter and the proposed approach has acheved 7.9% gain on ClueWeb09B, 18.3% on WT10g and 29.4% on GOV2 in terms of Mean Average Precision (MAP). We also compared the performance of the proposed approach with two other query expansion approaches, KLDCo and BoCo. The approach achieved 0.574 (+0.236), 0.519 (+0.209), 0.422 (+0.185) and 0.654 (+0.243) gain in terms P@5, P@10, MAP and MRR against baselines.
Subject
Library and Information Sciences,Information Systems
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献