Author:
Hrp Nur Hasanah,Fikry Muhammad,Yusra Yusra
Abstract
The Angkola Batak language is a variety of Batak languages, to be precise in the southern Tapanuli area, which is still used and maintained as an everyday language. Until now, the resources of the Angkola Batak language are not yet available in digital form that can be used by researchers in the analytical stages of human natural language processing. Natural language processing (NLP Taks) for the Angkola Batak language must follow the stages of text processing starting from tokenization, lexical analysis, syntax, semantics, and phragmatics. This study conducted natural language processing in the first stage, namely lexical analysis. At the lexical analysis stage, one of the most important NLP tasks is stemming. Stemming is the process of determining root words from affixed words. In this research, an analysis and design of the Angkola Batak stemming algorithm have been carried out based on grammar rules. The stages in this research are starting from collecting the grammar rules of the Angkola Batak language, collecting basic words in the Angkola Batak language as a database dictionary, and removing affixes from root words. The output of this research is the stemmer of the Angkola Batak language in the form of PHP. Based on tests conducted on 450 words originating from the Batak Angkola folklore, 448 test words were correct (99.56%) and 2 test words were wrong (0.44%). The wrong test word is obtained because the root word is not found in the dictionary.
Publisher
Forum Kerjasama Pendidikan Tinggi (FKPT)
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Direct Machine Translation Indonesian-Batak Toba;2023 7th International Conference on New Media Studies (CONMEDIA);2023-12-06