Cheating Automatic Short Answer Grading with the Adversarial Usage of Adjectives and Adverbs-Reference-Cited by-同舟云学术

Cheating Automatic Short Answer Grading with the Adversarial Usage of Adjectives and Adverbs

Published:2023-07-26 Issue: Volume: Page:
ISSN:1560-4292
Container-title:International Journal of Artificial Intelligence in Education
language:en
Short-container-title:Int J Artif Intell Educ

Author:

Filighera Anna^ORCID,Ochs Sebastian,Steuer Tim^ORCID,Tregel Thomas^ORCID

Abstract

AbstractAutomatic grading models are valued for the time and effort saved during the instruction of large student bodies. Especially with the increasing digitization of education and interest in large-scale standardized testing, the popularity of automatic grading has risen to the point where commercial solutions are widely available and used. However, for short answer formats, automatic grading is challenging due to natural language ambiguity and versatility. While automatic short answer grading models are beginning to compare to human performance on some datasets, their robustness, especially to adversarially manipulated data, is questionable. Exploitable vulnerabilities in grading models can have far-reaching consequences ranging from cheating students receiving undeserved credit to undermining automatic grading altogether—even when most predictions are valid. In this paper, we devise a black-box adversarial attack tailored to the educational short answer grading scenario to investigate the grading models’ robustness. In our attack, we insert adjectives and adverbs into natural places of incorrect student answers, fooling the model into predicting them as correct. We observed a loss of prediction accuracy between 10 and 22 percentage points using the state-of-the-art models BERT and T5. While our attack made answers appear less natural to humans in our experiments, it did not significantly increase the graders’ suspicions of cheating. Based on our experiments, we provide recommendations for utilizing automatic grading systems more safely in practice.

Funder

Hessian State Chancellery in the Department of Digital Strategy and Development

Technische Universität Darmstadt

Publisher

Springer Science and Business Media LLC

Subject

Computational Theory and Mathematics,Education

Link

https://link.springer.com/content/pdf/10.1007/s40593-023-00361-2.pdf

Reference100 articles.

1. Akhtar, N., & Mian, A. (2018). Threat of adversarial attacks on deep learning in computer vision: A survey. IEEE Access, 6, 14410–14430. https://doi.org/10.1109/ACCESS.2018.2807385

2. Alexandron, G., Ruipérez-Valiente, J. A., Lee, S., & Pritchard, D. E. (2018). Evaluating the robustness of learning analytics results against fake learners. In V. Pammer-Schindler, M. Pérez-Sanagustín, H. Drachsler, R. Elferink, & M. Scheffel (Eds.), Lifelong Technology-Enhanced Learning (pp. 74–87). Springer International Publishing. https://doi.org/10.1007/978-3-319-98572-5_6

3. Alexandron, G., Yoo, L. Y., & Ruip´erez-Valiente JA, Lee S, Pritchard DE,. (2019). Are mooc learning analytics results trustworthy? with fake learners, they might not be! International Journal of Artificial Intelligence in Education, 29(4), 484–506. https://doi.org/10.1007/s40593-019-00183-1

4. Alzantot, M., Sharma, Y., Elgohary, A., Ho, B. J., Srivastava, M., & Chang, K. W. (2018). Generating natural language adversarial examples. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, Brussels, Belgium, pp. 2890–2896. https://doi.org/10.18653/v1/D18-1316.https://aclanthology.org/D18-1316. Accessed 02 May 2023

5. Amidei, J., Piwek, P., & Willis, A. (2019). Agreement is overrated: A plea for correlation to assess human evaluation reliability. In: Proceedings of the 12th International Conference on Natural Language Generation, Association for Computational Linguistics, Tokyo, Japan, pp. 344–354. https://doi.org/10.18653/v1/W19-8642. https://aclanthology.org/W19-8642. Accessed 02 May 2023

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The Analysis of Deep Learning Recurrent Neural Network in English Grading Under the Internet of Things;IEEE Access;2024

2. Short-Answer Grading for German: Addressing the Challenges;International Journal of Artificial Intelligence in Education;2023-12-07