Semi-supervised Learning Models for Sentiment Analysis on Marketplace Dataset-Reference-Cited by-同舟云学术

Semi-supervised Learning Models for Sentiment Analysis on Marketplace Dataset

Published:2022-12-03 Issue:2 Volume:4 Page:78-85
ISSN:2686-6269
Container-title:International Journal of Artificial Intelligence & Robotics (IJAIR)
language:
Short-container-title:Int. J. Artif. Intell. Robot.

Author:

Wisnalmawati Wisnalmawati,Aribowo Agus Sasmito,Herawati Yunie

Abstract

Sentiment analysis aims to categorize opinions using an annotated corpus to train the model. However, building a high-quality, fully annotated corpus takes a lot of effort, time, and expense. The semi-supervised learning technique efficiently adds training data automatically from unlabeled data. The labeling process, which requires human expertise and requires time, can be helped by an SSL approach. This study aims to develop an SSL-Model for sentiment analysis and to compare the learning capabilities of Naive Bayes (NB) and Random Forest (RF) in the SSL. Our model attempts to annotate opinion documents in Indonesian. We use an ensemble multi-classifier that works on unigrams, bigrams, and trigrams vectors. Our model test uses a marketplace dataset containing rating comments scrapping from Shopee for smartphone products in the Indonesian Language. The research started with data preparation, vectorization using TF-IDF, feature extraction, modeling using Random Forest (RF) and Naïve Bayes (NB), and evaluation using Accuracy and F1-score. The performance of the NB model outperformed previous research, increasing by 5,5%. The conclusion is that SSL performance highly depends on the number of training data and the compatibility of the features or patterns in the document with machine learning. On our marketplace dataset, better to use Random Forest.

Publisher

Dr. Soetomo University

Subject

Polymers and Plastics,General Environmental Science

Reference24 articles.

1. H. Imaduddin, Widyawan, and S. Fauziati, “Word Embedding Comparison For Indonesian Language Sentiment Analysis,” Proceeding - 2019 International Conference of Artificial Intelligence and Information Technology, ICAIIT 2019, pp. 426–430, 2019, doi: 10.1109/ICAIIT.2019.8834536.

2. R. Monika, S. Deivalakshmi, and B. Janet, “Sentiment Analysis of US Airlines Tweets Using LSTM/RNN,” Proceedings of the 2019 IEEE 9th International Conference on Advanced Computing, IACC 2019, pp. 92–95, 2019, doi: 10.1109/IACC48062.2019.8971592.

3. A. H. Abdulhafiz, “Novel opinion mining system for movie rviews in Turkish,” International Journal of Intelligent Systems and Applications in Engineering, vol. 8, no. 2, pp. 94–101, 2020, doi: 10.18201/ijisae.2020261590.

4. D. F. Budiono, A. S. Nugroho, and A. Doewes, “Twitter sentiment analysis of DKI Jakarta’s gubernatorial election 2017 with predictive and descriptive approaches,” Proceedings - 2017 International Conference on Computer, Control, Informatics and its Applications: Emerging Trends In Computational Science and Engineering, IC3INA 2017, vol. 2018-Janua, pp. 89–94, 2017, doi: 10.1109/IC3INA.2017.8251746.

5. A. Al-Laith, M. Shahbaz, H. F. Alaskar, and A. Rehmat, “Arasencorpus: A semi-supervised approach for sentiment annotation of a large arabic text corpus,” Applied Sciences (Switzerland), vol. 11, no. 5, 2021, doi: 10.3390/app11052434.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Development and Comparison of Multiple Emotion Classification Models in Indonesia Text Using Machine Learning;Journal of Advances in Information Technology;2024