Validation of machine learning models for automated sentiment determination of Russian-language texts

Author:

Basina Polina A., ,Dunaeva Darya O.,Sarkisova Anna Yu., ,

Abstract

Sentiment analysis is one of the most demanded natural language processing operations for solving applied problems. One of the key methods of automated sentiment analysis is supervised machine learning. In the presence of a large selection of ready-made solutions for determining the tonality, the results of the models give significant errors due to the complexity and contextual conditionality of the linguistic explication of emotions. The article presents the results of the validation of 6 models for determining the sentiment of Russian-language publications using a research validation dataset – expertly marked 300 statements extracted from social network messages on the subject of quality of life and corresponding to one of the sentiment types: positive, negative, neutral. To evaluate the performance of the models, interannotator agreement coefficients were used, in particular, Krippendorff’s alpha, Cohen’s kappa and Fleiss’s kappa coefficients. The obtained values of the coefficients showed a low level of reliability between the expert labels and the labels that were assigned by the models. Among the experiments performed, the lowest agreement coefficients were achieved for the Blanchefort model trained on Rusentiment data, and the highest for the model of the same developer trained on medical feedback data. Based on the results obtained, conclusions were drawn about the most common causes of disagreements in determining sentiment by machine learning models. Machine learning models correctly identify the tone of texts if they contain bright lexical markers that match in tone the general tone of the statement. On the contrary, problems in determining the tone of an emotionally charged message by the model are provoked by the presence of a word with the opposite tone in it. The use of emotive vocabulary that does not match the tone of the entire statement, the presence of marker words not in their direct meanings, the use of uppercase, forms of complicated communication (including irony, sarcasm) remain risk factors for attracting automated analysis resources: with a high degree of probability the automatic classification model will not be able to correctly determine the tone of the text. The main reason for the “difficulties” of the automated determination of sentiment is the complexity of the task of focusing on the utterance as an integral unit and the refusal to focus on individual formal indicators. The utterance is the minimum communicative unit of speech. Capturing its semantic and emotionally expressive integrity is a super task for machine learning models in sentiment analysis. So, it is still quite difficult to trust machine learning models in solving such a complex task as automated categorization of emotions. It is advisable to associate the prospects for research directions in this area, first of all, with the development of high-quality, linguistically sound training datasets.

Publisher

Tomsk State University

Subject

General Medicine

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3