Application of Naïve Bayes Algorithm Variations On Indonesian General Analysis Dataset for Sentiment Analysis-Reference-Cited by-同舟云学术

Application of Naïve Bayes Algorithm Variations On Indonesian General Analysis Dataset for Sentiment Analysis

Published:2022-08-22 Issue:4 Volume:6 Page:585-590
ISSN:2580-0760
Container-title:Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi)
language:
Short-container-title:J. RESTI (Rekayasa Sist. Teknol. Inf.)

Author:

Umar Najirah,M. Adnan Nur

Abstract

Indonesian General Analysis Dataset is a dataset sourced from social media twitter by using keywords in the form of conjunctions to get a dataset that does not only focus on a particular topic. The use of Indonesian language datasets with general topics can be used to test the accuracy of the classification model so as to provide additional reference in choosing the right methods and parameters for sentiment analysis. One of the algorithms which in several studies produces the highest level of accuracy is naive Bayes which has several variations. This study aims to obtain the method with the best accuracy from the naive Bayes variation by setting the minimum and maximum document frequency parameters on the Indonesian General Analysis Dataset for sentiment analysis. The naive Bayes classifier variations used include Bernoulli naive Bayes, gaussian naive Bayes, complement naive Bayes and multinomial naive Bayes. The research stage begins with downloading the dataset. Preprocessing becomes the next stage which consists of tokenizing, stemming, converting abbreviations and eliminating conjunctions. In the preprocessed data, feature extraction is carried out by converting the dataset into vectors and applying the TF-IDF method before entering the sentiment analysis classification stage. Tests in this study were carried out by applying the minimum document frequency (min-df) and maximum document frequency (max-df) for each variation of naive Bayes to obtain the appropriate parameters. The test uses k-fold cross validation of the dataset to divide the training data and sentiment analysis test data. The next confusion matrix is made to evaluate the level of accuracy.

Publisher

Ikatan Ahli Informatika Indonesia (IAII)

Subject

General Medicine

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Comparative Analysis of Algorithms Naïve Bayes and C45 for Student Satisfaction with Administrative Services;2023 International Conference of Computer Science and Information Technology (ICOSNIKOM);2023-11-10

2. Multi-Class Text Classification on Khmer News Using Ensemble Method in Machine Learning Algorithms;Acta Informatica Pragensia;2023-10-10