Abstract
AbstractThe exponential growth of social media has brought an increasing propagation of online hostile communication and vitriolic discourses, and social media have become a fertile ground for heated discussions that frequently result in the use of insulting and offensive language. Lexical resources containing specific negative words have been widely employed to detect uncivil communication. This paper describes the development and implementation of an innovative resource, namely the Revised HurtLex Lexicon, in which every headword is annotated with an offensiveness level score. The starting point is HurtLex, a multilingual lexicon of hate words. Concentrating on the Italian entries, we revised the terms in HurtLex and derived an offensive score for each lexical item by applying an Item Response Theory model to the ratings provided by a large number of annotators. This resource can be used as part of a lexicon-based approach to track offensive and hateful content. Our work comprises an evaluation of the Revised HurtLex lexicon.
Funder
Università degli Studi G. D'Annunzio Chieti Pescara
Publisher
Springer Science and Business Media LLC
Subject
General Social Sciences,Statistics and Probability
Reference41 articles.
1. Albert, J.H., Chib, S.: Bayesian analysis of binary and polychotomous response data. J. Am. Stat. Assoc. 88(422), 669–679 (1993). https://doi.org/10.1080/01621459.1993.10476321
2. Almatarneh, S., Gamallo, P.: A lexicon based method to search for extreme opinions. PLoS ONE 13(5), 1–19 (2018). https://doi.org/10.1371/journal.pone.0197816
3. Baldwin, P., Bernstein, J., Wainer, H.: Hip psychometrics. Stat. Med. 28(17), 2277–2292 (2009). https://doi.org/10.1002/sim.3616
4. Basile, V., Nissim, M.: Sentiment analysis on Italian tweets. In: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 100–107. Association for Computational Linguistic, Atlanta, GA (2013). https://aclanthology.org/W13-1614
5. Basile, V., Lai, M., Sanguinetti, M.: Long-term social media data collection at the University of Turin. In: Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Torino, Italy, 10–12 Dec 2018 (2018). http://ceur-ws.org/Vol-2253/paper48.pdf
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献