Affiliation:
1. Universitas Nusa Cendana, Indonesia
Abstract
The InDriver service is an online transportation service that has more flexibility in price and driver choice by consumers. Various comments from InDriver service users can affect people's views, so it is necessary to carry out a sentiment analysis of these comments. The purpose of this study was to identify positive, negative and neutral sentiments in user comments and to compare the performance of classification methods. The results of analysis with unbalanced datasets show that the Support Vector Machine (SVM) and Logistic Regression methods have the highest accuracy, reaching 89%. However, quality assessment is not only based on accuracy alone. In terms of the balance between precision and recall in the minority (neutral) class, the Random Forest method shows a more balanced performance with an F1-score of 55%. After balancing the dataset with the SMOTE method, performance increases significantly for the Naïve Bayes Classifier method, especially in the neutral class for recall and F1-score metrics of 57% and 52%. In conclusion, SVM and Logistic Regression have high accuracy, but to consider the balance of precision and recall in the minority class, the Random Forest method is recommended.