Affiliation:
1. CATALPA FernUniversität in Hagen
2. DIPF | Leibniz Institute for Research and Information in Education
3. Centre for International Student Assessment (ZIB)
Abstract
AbstractIn this article, we systematize the factors influencing performance and feasibility of automatic content scoring methods for short text responses. We argue that performance (i.e., how well an automatic system agrees with human judgments) mainly depends on the linguistic variance seen in the responses and that this variance is indirectly influenced by other factors such as target population or input modality. Extending previous work, we distinguish conceptual, realization, and nonconformity variance, which are differentially impacted by the various factors. While conceptual variance relates to different concepts embedded in the text responses, realization variance refers to their diverse manifestation through natural language. Nonconformity variance is added by aberrant response behavior. Furthermore, besides its performance, the feasibility of using an automatic scoring system depends on external factors, such as ethical or computational constraints, which influence whether a system with a given performance is accepted by stakeholders. Our work provides (i) a framework for assessment practitioners to decide a priori whether automatic content scoring can be successfully applied in a given setup as well as (ii) new empirical findings and the integration of empirical findings from the literature on factors that influence automatic systems' performance.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. From the Automated Assessment of Student Essay Content to Highly Informative Feedback: a Case Study;International Journal of Artificial Intelligence in Education;2024-01-25
2. A Method of Computer Automatic Scoring for Subjective Questions;2023 2nd International Conference on Artificial Intelligence, Human-Computer Interaction and Robotics (AIHCIR);2023-12-08