Towards Kurdish Information Retrieval-Reference-Cited by-同舟云学术

Towards Kurdish Information Retrieval

Published:2014-06 Issue:2 Volume:13 Page:1-18
ISSN:1530-0226
Container-title:ACM Transactions on Asian Language Information Processing
language:en
Short-container-title:ACM Transactions on Asian Language Information Processing

Author:

Esmaili Kyumars Sheykh¹,Salavati Shahin²,Datta Anwitaman³

Affiliation:

1. Technicolor, France

2. University of Kurdistan, Iran

3. Nanyang Technological University, Singapore

Abstract

The Kurdish language is an Indo-European language spoken in Kurdistan, a large geographical region in the Middle East. Despite having a large number of speakers, Kurdish is among the less-resourced languages and has not seen much attention from the IR and NLP research communities. This article reports on the outcomes of a project aimed at providing essential resources for processing Kurdish texts. A principal output of this project is Pewan, the first standard Test Collection to evaluate Kurdish Information Retrieval systems. The other language resources that we have built include a lightweight stemmer and a list of stopwords. Our second principal contribution is using these newly-built resources to conduct a thorough experimental study on Kurdish documents. Our experimental results show that normalization, and to a lesser extent, stemming, can greatly improve the performance of Kurdish IR systems.

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Link

https://dl.acm.org/doi/pdf/10.1145/2556948

Reference57 articles.

1. Hajir Abollahpour. 2013. Hajir Dictionary. http://kurmanj.ir/news.php?readmore=76. Hajir Abollahpour. 2013. Hajir Dictionary. http://kurmanj.ir/news.php?readmore=76.

2. Hamshahri: A standard Persian text collection

3. N-gram and Local Context Analysis for Persian text retrieval

4. A comparison between allophone, syllable, and diphone based TTS systems for Kurdish language

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Systematic Review of Stemmers of Indian and Non-Indian Vernacular Languages;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-01-15

2. CURE: Collection for Urdu Information Retrieval Evaluation and Ranking;2021 International Conference on Digital Futures and Transformative Technologies (ICoDT2);2021-05-20

3. Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations;Organizational Research Methods;2020-11-23

4. Next word prediction based on the N-gram model for Kurdish Sorani and Kurmanji;Neural Computing and Applications;2020-08-11

5. A Rule-Based Kurdish Text Transliteration System;ACM Transactions on Asian and Low-Resource Language Information Processing;2019-06-30