Author:
Basina Polina A., ,Dunaeva Darya O.,Sarkisova Anna Yu., ,
Abstract
Sentiment analysis is one of the most demanded natural language processing operations for solving applied problems. One of the key methods of automated sentiment analysis is supervised machine learning. In the presence of a large selection of ready-made solutions for determining the tonality, the results of the models give significant errors due to the complexity and contextual conditionality of the linguistic explication of emotions. The article presents the results of the validation of 6 models for determining the sentiment of Russian-language publications using a research validation dataset – expertly marked 300 statements extracted from social network messages on the subject of quality of life and corresponding to one of the sentiment types: positive, negative, neutral. To evaluate the performance of the models, interannotator agreement coefficients were used, in particular, Krippendorff’s alpha, Cohen’s kappa and Fleiss’s kappa coefficients. The obtained values of the coefficients showed a low level of reliability between the expert labels and the labels that were assigned by the models. Among the experiments performed, the lowest agreement coefficients were achieved for the Blanchefort model trained on Rusentiment data, and the highest for the model of the same developer trained on medical feedback data. Based on the results obtained, conclusions were drawn about the most common causes of disagreements in determining sentiment by machine learning models. Machine learning models correctly identify the tone of texts if they contain bright lexical markers that match in tone the general tone of the statement. On the contrary, problems in determining the tone of an emotionally charged message by the model are provoked by the presence of a word with the opposite tone in it. The use of emotive vocabulary that does not match the tone of the entire statement, the presence of marker words not in their direct meanings, the use of uppercase, forms of complicated communication (including irony, sarcasm) remain risk factors for attracting automated analysis resources: with a high degree of probability the automatic classification model will not be able to correctly determine the tone of the text. The main reason for the “difficulties” of the automated determination of sentiment is the complexity of the task of focusing on the utterance as an integral unit and the refusal to focus on individual formal indicators. The utterance is the minimum communicative unit of speech. Capturing its semantic and emotionally expressive integrity is a super task for machine learning models in sentiment analysis. So, it is still quite difficult to trust machine learning models in solving such a complex task as automated categorization of emotions. It is advisable to associate the prospects for research directions in this area, first of all, with the development of high-quality, linguistically sound training datasets.