Author:
Rytsarev I A,Paringer R A,Kupriyanov A V,Kirsh D V
Abstract
Abstract
In the paper, we propose an approach to the analysis of social groups and their relative positioning based on the identification of semantic differences in texts presented in the form of frequency dictionaries. The initial textual data was obtained by collecting records of thematic Internet communities. To collect entries, we implemented a specialized software module for downloading and analyzing posts as well as comments from open communities of interest in the social network VKontakte. The developed algorithm of frequency dictionary compilation evaluates the characteristics of data collected from social networks. For keywords identification, we propose a new approach based on the analysis of word frequency distribution, using methods for dimension reduction of feature spaces. The presented algorithm using the principal component analysis allowed to assess the significance of words by coefficients of the linear transformation. Along with the keywords, we identified semantic differences of social network communities and estimated their relative positioning in the transformed feature space.
Subject
General Physics and Astronomy
Reference9 articles.
1. Application of the principal component analysis to detect semantic differences during the content analysis of social networks;Rytsarev;CEUR Workshop Proceedings,2018
2. Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines;Kosinski;American Psychologist,2015
3. Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach;Schwartz;PLoS ONE,2013