Abstract
AbstractIn this paper, we propose a semi-supervised method to cluster unstructured textual data called semi-supervised sentiment clustering on natural language texts. The aim is to identify clusters homogeneous with respect to the overall sentiment of the texts analyzed. The method combines different techniques and methodologies: Sentiment Analysis, Threshold-based Naïve Bayes classifier, and Network-based Semi-supervised Clustering. It involves different steps. In the first step, the unstructured text is transformed into structured text, and it is categorized into positive or negative classes using a sentiment analysis algorithm. In the second step, the Threshold-based Naïve Bayes classifier is applied to identify the overall sentiment of the texts and to define a specific sentiment value for the topics. In the last step, Network-based Semi-supervised Clustering is applied to partition the instances into disjoint groups. The proposed algorithm is tested on a collection of reviews written by customers on Booking.com. The results have highlighted the capacity of the proposed algorithm to identify clusters that are distinct, non-overlapped, and homogeneous with respect to the overall sentiment. Results are also easily interpretable thanks to the network representation of the instances that helps to understand the relationship between them.
Funder
Università degli Studi di Cagliari
Publisher
Springer Science and Business Media LLC
Subject
Statistics, Probability and Uncertainty,Statistics and Probability
Reference51 articles.
1. Agarwal B, Mittal N (2016) Machine learning approach for sentiment analysis. Springer, Cham, pp 21–45
2. Baek S, Jung W, Han SH (2021) A critical review of text-based research in construction: data source, analysis method, and implications. Autom Constr 132(103):915
3. Bair E (2013) Semi-supervised clustering methods. Wiley Interdiscipl Rev Comput Stat 5(5):349–361
4. Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, pp 59–68
5. Bilenko M, Basu S, Mooney RJ (2004) Integrating constraints and metric learning in semi-supervised clustering. In: Proceedings of the 21st international conference on Machine learning, p 11
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Research on Rural Tourism Feature Classification Method Based on Hierarchical Cluster Analysis;2024 Second International Conference on Data Science and Information System (ICDSIS);2024-05-17