Meta-search based approach for Arabic information retrieval

Author:

Ben Guirat SouheilaORCID,Bounhas IbrahimORCID,Slimani Yahya

Abstract

PurposeThe semantic relations between Arabic word representations were recognized and widely studied in theoretical studies in linguistics many centuries ago. Nonetheless, most of the previous research in automatic information retrieval (IR) focused on stem or root-based indexing, while lemmas and patterns are under-exploited. However, the authors believe that each of the four morphological levels encapsulates a part of the meaning of words. That is, the purpose is to aggregate these levels using more sophisticated approaches to reach the optimal combination which enhances IR.Design/methodology/approachThe authors first compare the state-of-the art Arabic natural language processing (NLP) tools in IR. This allows to select the most accurate tool in each representation level i.e. developing four basic IR systems. Then, the authors compare two rank aggregation approaches which combine the results of these systems. The first approach is based on linear combination, while the second exploits classification-based meta-search.FindingsCombining different word representation levels, consistently and significantly enhances IR results. The proposed classification-based approach outperforms linear combination and all the basic systems.Research limitations/implicationsThe work stands by a standard experimental comparative study which assesses several NLP tools and combining approaches on different test collections and IR models. Thus, it may be helpful for future research works to choose the most suitable tools and develop more sophisticated methods for handling the complexity of Arabic language.Originality/valueThe originality of the idea is to consider that the richness of Arabic is an exploitable characteristic and no more a challenging limit. Thus, the authors combine 4 different morphological levels for the first time in Arabic IR. This approach widely overtook previous research results.Peer reviewThe peer review history for this article is available at: https://publons.com/publon/10.1108/OIR-11-2020-0515

Publisher

Emerald

Subject

Library and Information Sciences,Computer Science Applications,Information Systems

Reference75 articles.

1. Structure-based evaluation of an Arabic semantic query expansion using the JIRS passage retrieval system,2009

2. On data fusion in information retrieval using different aggregation operators;Web Intelligence and Agent Systems,2011

3. Benchmarking and assessing the performance of Arabic stemmers;Journal of Information Science,2011

4. A novel root based Arabic stemmer;Journal of King Saud University-Computer and Information Sciences,2015

5. Towards an error-free Arabic stemming,2008

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3