Abstract
In this article, we focus on the design of a second language (L2) formative feedback system that provides linguistic complexity graph reports on the writings of English for special purposes students at the university level. The system is evaluated in light of formative instruction features pointed out in the literature. The significance of complexity metrics is also evaluated. A learner corpus of English classified according to the Common European Framework of References for Languages (CEFR) was processed using a pipeline that computes 83 complexity metrics. By way of analysis of variance (ANOVA) testing, multinomial logistic regression, and clustering methods, we identified and validated a set of nine significant metrics in terms of proficiency levels. Validation with classification gave 67.51% (A level), 60.16% (B level), and 60.47% (C level) balanced accuracy. Clustering showed between 53.10% and 67.37% homogeneity, depending on the level. As a result, these metrics were used to create graphical reports about the linguistic complexity of learner writing. These reports are designed to help language teachers diagnose their students’ writings in comparison with prerecorded cohorts of different proficiencies.
Subject
Computer Science Applications,Linguistics and Language,Language and Linguistics,Education
Reference43 articles.
1. Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater® V.2. Journal of Technology, Learning and Assessment, 4(3), 3–29. https://ejournals.bc.edu/index.php/jtla/article/view/1650
2. Baayen, R. H. (2008). Analyzing linguistic data: A practical introduction to statistics using R. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511801686
3. Ballier, N., Canu, S., Petitjean, C., Gasso, G., Balhana, C., …, & Gaillat, T. (2020). Machine learning for learner English. International Journal of Learner Corpus Research, 6(1), 72–103. https://doi.org/10.1075/ijlcr.18012.bal
4. Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., …, & Matsuo, A. (2018). quanteda: An R package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774. https://doi.org/10.21105/joss.00774
5. Biber, D., Gray, B., Staples, S., & Egbert, J. (2020). Investigating grammatical complexity in L2 English writing research: Linguistic description versus predictive measurement. Journal of English for Academic Purposes, 46, 100869. https://doi.org/10.1016/j.jeap.2020.100869