Text-based Question Difficulty Prediction: A Systematic Review of Automatic Approaches-Reference-Cited by-同舟云学术

Text-based Question Difficulty Prediction: A Systematic Review of Automatic Approaches

Published:2023-09-08 Issue: Volume: Page:
ISSN:1560-4292
Container-title:International Journal of Artificial Intelligence in Education
language:en
Short-container-title:Int J Artif Intell Educ

Author:

AlKhuzaey Samah^ORCID,Grasso Floriana^ORCID,Payne Terry R.^ORCID,Tamma Valentina^ORCID

Abstract

AbstractDesigning and constructing pedagogical tests that contain items (i.e. questions) which measure various types of skills for different levels of students equitably is a challenging task. Teachers and item writers alike need to ensure that the quality of assessment materials is consistent, if student evaluations are to be objective and effective. Assessment quality and validity are therefore heavily reliant on the quality of the items included in the test. Moreover, the notion of difficulty is an essential factor that can determine the overall quality of the items and the resulting tests.Thus, item difficulty prediction is extremely important in any pedagogical learning environment. Although difficulty is traditionally estimated either by experts or through pre-testing, such methods are criticised for being costly, time-consuming, subjective and difficult to scale, and consequently, the use of automatic approaches as proxies for these traditional methods is gaining more and more traction. In this paper, we provide a comprehensive and systematic review of methods for the priori prediction of question difficulty. The aims of this review are to: 1) provide an overview of the research community regarding the publication landscape; 2) explore the use of automatic, text-based prediction models; 3) summarise influential difficulty features; and 4) examine the performance of the prediction models. Supervised machine learning prediction models were found to be mostly used to overcome the limitations of traditional item calibration methods. Moreover, linguistic features were found to play a major role in the determination of item difficulty levels, and several syntactic and semantic features were explored by researchers in this area to explain the difficulty of pedagogical assessments. Based on these findings, a number of challenges to the item difficulty prediction community are posed, including the need for a publicly available repository of standardised data-sets and further investigation into alternative feature elicitation and prediction models.

Funder

Saudi Arabian Cultural Bureau

Umm Al-Qura University

Publisher

Springer Science and Business Media LLC

Subject

Computational Theory and Mathematics,Education

Link

https://link.springer.com/content/pdf/10.1007/s40593-023-00362-1.pdf

Reference88 articles.

1. AlKhuzaey, S., Grasso, F., Payne, T. R., & Tamma, V. (2021). A systematic review of data-driven approaches to item difficulty prediction. In International Conference on Artificial Intelligence in Education (pp. 29–41). Springer.

2. Alsubait, T., Parsia, B., & Sattler, U. (2013). A similarity-based theory of controlling MCQ difficulty. In 2013 Second International Conference on Elearning and E-Technologies in Education (ICEEE) (pp. 283–288). IEEE.

3. Alsubait, T., Parsia, B., & Sattler, U. (2016). Ontology-based multiple choice question generation. KI-Künstliche Intelligenz, 30(2), 183–188.

4. Amidei, J., Piwek, P., & Willis, A. (2018). Evaluation methodologies in automatic question generation 2013–2018. In Proceedings of the 11th International Natural Language Generation Conference (pp. 307–317).

5. Aryadoust, V. (2013). Predicting item difficulty in a language test with an adaptive neuro fuzzy inference system. In IEEE Workshop on Hybrid Intelligent Models and Applications (HIMA) (Vol. 2013, pp. 43–50).