A selective approach to stemming for minimizing the risk of failure in information retrieval systems

Author:

Göksel Gökhan1,Arslan Ahmet1,Dinçer Bekir Taner2

Affiliation:

1. Computer Engineering, Eskişehir Technical University, Eskisehir, Turkey

2. Computer Engineering, Muğla Sıtkı Koçman University, Mugla, Turkey

Abstract

Stemming is supposed to improve the average performance of an information retrieval system, but in practice, past experimental results show that this is not always the case. In this article, we propose a selective approach to stemming that decides whether stemming should be applied or not on a query basis. Our method aims at minimizing the risk of failure caused by stemming in retrieving semantically-related documents. The proposed work mainly contributes to the IR literature by proposing an application of selective stemming and a set of new features that derived from the term frequency distributions of the systems in selection. The method based on the approach leverages both some of the query performance predictors and the derived features and a machine learning technique. It is comprehensively evaluated using three rule-based stemmers and eight query sets corresponding to four document collections from the standard TREC and NTCIR datasets. The document collections, except for one, include Web documents ranging from 25 million to 733 million. The results of the experiments show that the method is capable of making accurate selections that increase the robustness of the system and minimize the risk of failure (i.e., per query performance losses) across queries. The results also show that the method attains a systematically higher average retrieval performance than the single systems for most query sets.

Funder

TÜBİTAK, scientific and technological research projects funding program

Publisher

PeerJ

Subject

General Computer Science

Reference76 articles.

1. A rule-based stemmer for Arabic Gulf dialect;Abuata;Journal of King Saud University - Computer and Information Sciences,2015

2. Evaluation of n-gram conflation approaches for Arabic text retrieval;Ahmed;Journal of the American Society for Information Science and Technology,2009

3. Rule merging in a rule-based Arabic stemmer;Al Kharashi,2002

4. A cognitive inspired unsupervised language-independent text stemmer for Information retrieval;Alotaibi;Cognitive Systems Research,2018

5. Query difficulty, robustness, and selective application of query expansion;Amati,2004

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3