Author:
Jumadi J,Maylawati D S,Pratiwi L D,Ramdhani M A
Abstract
Abstract
Stemming is a process contained in the pre-processing stage that recognizes basic words (stem word) by combining or solving each of the variants of a word. Every language is unique, the most popular stemming algorithm for Indonesian text is Nazief-Adriani algorithm. Therefore, this study aims to compare Nazief-Adriani algorithm with another stemming algorithm for Indonesian text, that is Paice-Husk stemming algorithm which is commonly used for English. Beside, Nazief-Adriani and Paice-Husk algorithm for stemming process, this study use McCabe Cyclometic Complexity Metrix to evaluate the complexity of algorithm. Based on the experiment result with 20 sentences as data with a thousand words, the accuracy of the Nazief-Adriani algorithm is better than the Paice-Husk algorithm, which is 91.87% compared to 64.43%. Likewise, in terms of complexity, the algorithm is still more complex Paice-Husk than Nazief-Adriani. However, in terms of processing time, the Paice-Husk algorithm is slightly faster than the Nazief-Adriani algorithm. These results indicate that the Paice-Husk algorithm requires a more complete implementation of Indonesian morphological and grammatical rules to produce the better Indonesian stem words.
Reference33 articles.
1. Text Knowledge Mining: And Approach to Text Mining;Torre;ESTYLF08,2008
2. Text mining;Witten,2004
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献