Author:
Dudău Diana Paula,Sava Florin Alin
Abstract
Today, there is a range of computer-aided techniques to convert text into data. However, they convey not only strengths but also vulnerabilities compared to traditional content analysis. One of the challenges that have gained increasing attention is performing automatic language analysis to make sound inferences in a multilingual assessment setting. The current study is the first to test the equivalence of multiple versions of one of the most appealing and widely used lexicon-based tools worldwide, Linguistic Inquiry and Word Count 2015 (LIWC2015). For this purpose, we employed supervised learning in a classification problem and computed Pearson's correlations and intraclass correlation coefficients on a large corpus of parallel texts in English, Dutch, Brazilian Portuguese, and Romanian. Our findings suggested that LIWC2015 is a valuable tool for multilingual analysis, but within-language standardization is needed when the aim is to analyze texts sourced from different languages.
Funder
Ministerul Educaței și Cercetării Științifice
Reference58 articles.
1. Understanding diagnostic tests 3: receiver operating characteristic curves;Akobeng;Acta Paediatrica,2007
2. A comparative study of machine translation for multilingual sentence-level sentiment analysis;Araújo;Inf. Sci.,2020
3. SentiWordNet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining;Baccianella;Proceedings of the Seventh Conference on International Language Resources and Evaluation (LREC'10),2010
4. An evaluation of the Brazilian Portuguese LIWC dictionary for sentiment analysis;Balage Filho;Proceedings of the 9,2013
5. Sentiment analysis system adaptation for multilingual processing: the case of tweets;Balahur;Inf. Process. Manage.,2015
Cited by
17 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献