Abstract
Abstract
We present the SFU Opinion and Comments Corpus (SOCC ), a collection of opinion articles and the comments posted in response to the articles. The articles include all the opinion pieces published in the Canadian newspaper The Globe and Mail in the 5-year period between 2012 and 2016, a total of 10,339 articles and 663,173 comments. SOCC is part of a project that investigates the linguistic characteristics of online comments. The corpus can be used to study a host of pragmatic phenomena. Among other aspects, researchers can explore: the connections between articles and comments; the connections of comments to each other; the types of topics discussed in comments; the nice (constructive) or mean (toxic) ways in which commenters respond to each other; how language is used to convey very specific types of evaluation; and how negation affects the interpretation of evaluative meaning in discourse. Our current focus is the study of constructiveness and evaluation in the comments. To that end, we have annotated a subset of the large corpus (1043 comments) with four layers of annotations: constructiveness, toxicity, negation and Appraisal (Martin and White, The language of evaluation, Palgrave, New York, 2005). This paper details our corpus, the data collection process, the characteristics of the corpus and describes the annotations. While our focus is comments posted in response to opinion news articles, the phenomena in this corpus are likely to be present in many commenting platforms: other news comments, comments and replies in fora such as Reddit, feedback on blogs, or YouTube comments.
Funder
Social Sciences and Humanities Research Council of Canada
Nvidia
Publisher
Springer Science and Business Media LLC
Subject
Computer Science Applications,Linguistics and Language,Language and Linguistics
Reference76 articles.
1. Alba-Juez, L., & Thompson, G. (2014). The many faces and phases of evaluation. In G. Thompson & L. Alba-Juez (Eds.), Evaluation in context (pp. 3–23). Amsterdam: John Benjamins.
2. Anand, P. & Martell, C. (2012). Annotating the focus of negation in terms of questions under discussion. In Proceedings of the workshop on extra-propositional aspects of meaning in computational linguistics, Jeju, Korea (pp. 65–69).
3. Aronow, D. B., Feng, F., & Croft, W. B. (1999). Ad hoc classification of radiology reports. Journal of the American Medical Informatics Association, 6(5), 393–411.
4. Artstein, R., & Poesio, M. (2008). Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4), 555–596.
5. Barker, E., & Gaizauskas, R. (2016). Summarizing multi-party argumentative conversations in reader comment on news. In Proceedings of ACL 2016, Berlin (pp. 12–20)
Cited by
43 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献