Abstract
This article is part of a larger project aiming at identifying discursive strategies in social media discourses revolving around the topic of gender diversity, for which roughly 350,000 comments were scraped from the comments sections below YouTube videos relating to the topic in question. This article focuses on different methods of standardizing social media data in order to enhance further processing. More specifically, the data are corrected in terms of casing, spelling, and punctuation. Different tools and models (LanguageTool, T5, seq2seq, GPT-2) were tested. The best outcome was achieved by the German GPT-2 model: It scored highest in all of the applied scores (ROUGE, GLEU, BLEU), making it the best model for the task of Grammatical Error Correction in German social media data.
Publisher
Universitat Politecnica de Valencia
Reference36 articles.
1. Awasthi, Abhijeet, Sunita Sarawagi, Rasna Goyal, Sabyasachi Ghosh, and Vihari Piratla. 2019. "Parallel Iterative Edit Models for Local Sequence Transduction." In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, Hong Kong, China, November 03-07. Association for Computational Linguistics. 4260-4270. https://doi.org/10.18653/v1/D19-1435
2. Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2015. "Neural Machine Translation by Jointly Learning to Align and Translate." Paper presented at ICLR 2015, San Diego, California, USA, May 07-09. https://arxiv.org/pdf/1409.0473.pdf.
3. Bangura, M., K. Barabashova, A. Karnysheva, S. Semczuk, and Y. Wang. 2023. "Automatic Generation of German Drama Texts Using Fine Tuned GPT-2 Models." https://arxiv.org/pdf/2301.03119.pdf
4. Casas, Noe, José A. R. Fonollosa, and Marta R. Costa-jussà. 2018. "A differentiable BLEU loss. Analysis and first results." Paper presented at ICLR 2018, Vancouver, Canada, April 30-May 03. 1-12. https://openreview.net/pdf?id=HkG7hzyvf
5. Cho, Kyunghyun, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. 2014. "On the Properties of Neural Machine Translation: Encoder-Decoder Approaches." In Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, October 25. Association for Computational Linguistics. 103-111. https://doi.org/10.3115/v1/W14-4012