Comparison of Nazief-Adriani and Paice-Husk algorithm for Indonesian text stemming process

Author:

Jumadi J,Maylawati D S,Pratiwi L D,Ramdhani M A

Abstract

Abstract Stemming is a process contained in the pre-processing stage that recognizes basic words (stem word) by combining or solving each of the variants of a word. Every language is unique, the most popular stemming algorithm for Indonesian text is Nazief-Adriani algorithm. Therefore, this study aims to compare Nazief-Adriani algorithm with another stemming algorithm for Indonesian text, that is Paice-Husk stemming algorithm which is commonly used for English. Beside, Nazief-Adriani and Paice-Husk algorithm for stemming process, this study use McCabe Cyclometic Complexity Metrix to evaluate the complexity of algorithm. Based on the experiment result with 20 sentences as data with a thousand words, the accuracy of the Nazief-Adriani algorithm is better than the Paice-Husk algorithm, which is 91.87% compared to 64.43%. Likewise, in terms of complexity, the algorithm is still more complex Paice-Husk than Nazief-Adriani. However, in terms of processing time, the Paice-Husk algorithm is slightly faster than the Nazief-Adriani algorithm. These results indicate that the Paice-Husk algorithm requires a more complete implementation of Indonesian morphological and grammatical rules to produce the better Indonesian stem words.

Publisher

IOP Publishing

Subject

General Medicine

Reference33 articles.

1. Text Knowledge Mining: And Approach to Text Mining;Torre;ESTYLF08,2008

2. Text mining;Witten,2004

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Unlocking Insights: A Literature Review on Enhanced Confix Stripping and Nazief & Adriani Algorithm Modifications for Makassar Language Text Stemming;International Journal of Innovative Science and Research Technology (IJISRT);2024-03-16

2. Analisis Sentimen: Pengaruh Jam Kerja Terhadap Kesehatan Mental Generasi Z;Journal of Applied Computer Science and Technology;2024-02-03

3. Comparison of Modified Nazief&Adriani and Modified Enhanced Confix Stripping algorithms for Madurese Language Stemming;INTENSIF: Jurnal Ilmiah Penelitian dan Penerapan Teknologi Sistem Informasi;2023-08-05

4. Spelling Correction Using the Levenshtein Distance and Nazief and Adriani Algorithm for Keyword Search Process Indonesian Qur'an Translation;2022 Seventh International Conference on Informatics and Computing (ICIC);2022-12-08

5. Stemming Algorithm for the Indonesian Language: A Scientometric View;2022 IEEE Creative Communication and Innovative Technology (ICCIT);2022-11-22

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3