Abstract
As the amount of content that is created on social media is constantly increasing, more and more opinions and sentiments are expressed by people in various subjects. In this respect, sentiment analysis and opinion mining techniques can be valuable for the automatic analysis of huge textual corpora (comments, reviews, tweets etc.). Despite the advances in text mining algorithms, deep learning techniques, and text representation models, the results in such tasks are very good for only a few high-density languages (e.g., English) that possess large training corpora and rich linguistic resources; nevertheless, there is still room for improvement for the other lower-density languages as well. In this direction, the current work employs various language models for representing social media texts and text classifiers in the Greek language, for detecting the polarity of opinions expressed on social media. The experimental results on a related dataset collected by the authors of the current work are promising, since various classifiers based on the language models (naive bayesian, random forests, support vector machines, logistic regression, deep feed-forward neural networks) outperform those of word or sentence-based embeddings (word2vec, GloVe), achieving a classification accuracy of more than 80%. Additionally, a new language model for Greek social media has also been trained on the aforementioned dataset, proving that language models based on domain specific corpora can improve the performance of generic language models by a margin of 2%. Finally, the resulting models are made freely available to the research community.
Funder
Operational Program Competitiveness, Entrepreneurship and Innovation of Greece, call RESEARCH - CREATE - INNOVATE
Cited by
24 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献