The Rule-Based Sundanese Stemmer

Author:

Suryani Arie Ardiyanti1,Widyantoro Dwi Hendratmo1,Purwarianti Ayu1,Sudaryat Yayat2

Affiliation:

1. STEI, Institut Teknologi Bandung

2. FPBS, Universitas Pendidikan Indonesia

Abstract

Our research proposed an iterative Sundanese stemmer by removing the derivational affixes prior to the inflexional. This scheme was chosen because, in the Sundanese affixation, a confix (one of derivational affix) is applied in the last phase of a morphological process. Moreover, most of Sundanese affixes are derivational, so removing the derivational affix as the first step is reasonable. To handle ambiguity, the last recognized affix was returned as the result. As the baseline, a Confix-Stripping Approach that applies Porter Stemmer for the Indonesian language was used. This stemmer shares similarities in terms of affix type, but uses a different stemming order. To observe whether the baseline stems the Sundanese affixed word properly, some features that were not covered by the baseline, such as the infix and allomorph removal, were added. The evaluation was done using 4,453 unique affixed words collected from Sundanese online magazines. The experiment shows that, as a whole, our stemmer outperforms the modified baseline in terms of recognized affixed type accuracy and properly stemmed affixed words. Our stemmer recognized 68.87% of the Sundanese affixed types and produced 96.79% of the correctly affixed words; the modified baseline resulted in 21.70% and 71.59%, respectively

Publisher

Association for Computing Machinery (ACM)

Subject

General Computer Science

Reference25 articles.

1. Stemming Indonesian

2. Pashto language stemming algorithm;Aslamzai S.;J. Teknol. Maklumat Multimedia Asia-Pasifik,2015

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. SUSTEM: An Improved Rule-based Sundanese Stemmer;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-06-21

2. A Systematic Review of Stemmers of Indian and Non-Indian Vernacular Languages;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-01-15

3. Building a Multilevel Inflection Handling Stemmer to Improve Search Effectiveness for Urdu Language;IEEE Access;2024

4. Development of Sundanese Stemmer Based on Morphophonemics;2023 10th International Conference on ICT for Smart Society (ICISS);2023-09-06

5. An Analytical Analysis of Text Stemming Methodologies in Information Retrieval and Natural Language Processing Systems;IEEE Access;2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3