Efficient Detection of Irrelevant User Reviews Using Machine Learning-Reference-Cited by-同舟云学术

Efficient Detection of Irrelevant User Reviews Using Machine Learning

Published:2024-08-07 Issue:16 Volume:14 Page:6900
ISSN:2076-3417
Container-title:Applied Sciences
language:en
Short-container-title:Applied Sciences

Author:

Kim Cheolgi¹^ORCID,Kim Hyeon Gyu²^ORCID

Affiliation:

1. School of EECS, Korea Aerospace University, Hanggongdaehak-ro 76-10, Deogyang-gu, Goyang-si 10540, Republic of Korea

2. Division of Computer Science and Engineering, Sahmyook University, Hwarang-ro 815, Nowon-gu, Seoul 01795, Republic of Korea

Abstract

User reviews such as SNS feeds and blog writings have been widely used to extract opinions, complains, and requirements about a given place or product from users’ perspective. However, during the process of collecting them, a lot of reviews that are irrelevant to a given search keyword can be included in the results. Such irrelevant reviews may lead to distorted results in data analysis. In this paper, we discuss a method to detect irrelevant user reviews efficiently by combining various oversampling and machine learning algorithms. About 35,000 user reviews collected from 25 restaurants and 33 tourist attractions in Ulsan Metropolitan City, South Korea, were used for learning, where the ratio of irrelevant reviews in the two kinds of data sets was 53.7% and 71.6%, respectively. To deal with skewness in the collected reviews, oversampling algorithms such as SMOTE, Borderline-SMOTE, and ADASYN were used. To build a model for the detection of irrelevant reviews, RNN, LSTM, GRU, and BERT were adopted and compared, as they are known to provide high accuracy in text processing. The performance of the detection models was examined through experiments, and the results showed that the BERT model presented the best performance, with an F1 score of 0.965.

Publisher

MDPI AG

Link

https://www.mdpi.com/2076-3417/14/16/6900/pdf

Reference65 articles.

1. Sagiroglu, S., and Sinanc, D. (2013, January 20–24). Big data: A review. Proceedings of the 2013 International Conference on Collaboration Technologies and Systems (CTS), San Diego, CA, USA.

2. Sachdeva, N., and McAuley, J. (2020, January 25–30). How useful are reviews for recommendation? A critical review and potential improvements. Proceedings of the SIGIR ‘20: The 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Xi’an, China.

3. Xu, Z., Zeng, H., and Ai, Q. (2021, January 11–15). Understanding the effectiveness of reviews in e-commerce top-N recommendation. Proceedings of the 2021 ACM SIGIR International Conference on the Theory of Information Retrieval, Virtual.

4. Jindal, N., and Liu, B. (2007, January 28–31). Analyzing and detecting review spam. Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), Omaha, NE, USA.

5. Survey of review spam detection using machine learning techniques;Crawford;J. Big Data,2015