Query Expansion Using Proposed Location-Based Algorithm for Hindi–English CLIR: Analyzing Three Test Collections

Author:

Chandra Ganesh12ORCID,Dwivedi Sanjay K.2

Affiliation:

1. Department of Computer Science & Engineering, Madhav Institute of Technology & Science, (Deemed to be University), Gwalior, Madhya Pradesh, India

2. Department of Computer Science, Babasaheb Bhimrao Ambedkar (A Central) University, Lucknow, Uttar Pradesh, India

Abstract

The rapid growth of contents on the Web in different languages increases the demand of Cross-Lingual Information Retrieval (CLIR). The accuracy of result suffers due to many problems such as ambiguity and drift issue in query. Query Expansion (QE) offers reliable solution for obtaining suitable documents for user queries. In this paper, we proposed an architecture for Hindi–English CLIR system using QE for improving the relevancy of retrieved results. In this architecture, for the addition of term(s) at appropriate position(s), we proposed a location-based algorithm to resolve the drift query issue in QE. User queries in Hindi language have been translated into document language (i.e. English) and the accuracy of translation is improved using Back-Translation. Google search has been performed and the retrieved documents are ranked using Okapi BM25 to arrange the documents in the order of decreasing relevancy to select the most suitable terms for QE. We used term selection value (TSV) for QE and for retrieving the terms, we created three test collections namely the (i) description and narration of the Forum for Information Retrieval Evaluation (FIRE) dataset, (ii) Snippets of retrieved documents against each query and (iii) Nearest-Neighborhood (NN) words against each query word among the ranked documents. To evaluate the system, 50 queries of Hindi language are selected from the FIRE-2012 dataset. In this paper, we performed two experiments: (i) impact of the proposed location-based algorithm on the proposed architecture of CLIR; and (ii) analysis of QE using three datasets, i.e. FIRE, NN and Snippets. In the first case, result shows that the relevancy of Hindi–English CLIR is improved by performing QE using the location-based algorithm and a 12% of improvement is achieved as compared to the results of QE obtained without applying the location-based algorithm. In the second case, the location-based algorithm is applied on three datasets. The Mean Average Precision (MAP) values of retrieved documents after QE are 0.5379 (NN), 0.6018 (FIRE) and 0.6406 (Snippets) for the three test collections, whereas the MAP before QE is 0.37102. This clearly shows the significant improvement of retrieved results for all three test collections. Among the three test collections, QE has been found most effective along with Snippets as indicated by the results with the improvements of 6.48% and 19.12% over FIRE and NN test collections, respectively.

Publisher

World Scientific Pub Co Pte Ltd

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3