Abstract
The paper attempts to classify the corruption-related media content of Russian-language and English-language Internet media using machine learning methods. The methodological approach proposed in the article is very relevant and promising, since, according to our earlier data, corruption monitoring mechanisms used in foreign publications based on the use of advanced information technologies have rather limited potential effectiveness and are not always adequately interpreted. The study shows the principles and grounds for identifying identification parameters, and also describes in detail the layout scheme of the collected news array. In the course of automatic text processing, which took place in 2 stages (vectorization of the text and the use of a learning model), it was possible to solve the main 4 tasks: highlighting a significant quote from a news article to identify a text on corruption topics, predicting the type of news message, predicting a relevant article of the Criminal Code of the Russian Federation, which is used to determine responsibility for the described corruption offense, as well as predicting the type of relationship in corruption offenses. The results obtained showed that modern methods of automatic text processing successfully cope with the tasks of identification and classification of corruption-related content in both Russian and English.
Publisher
Federal Center of Theoretical and Applied Sociology of the Russian Academy of Sciences (FCTAS RAS)
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献