Abstract
AbstractArgumentation, a key scientific practice presented in the Framework for K-12 Science Education, requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging machine learning (ML) and artificial intelligence (AI) to aid the scoring of written arguments in complex assessments. Moreover, research has amplified that the features (i.e., complexity, diversity, and structure) of assessment construct are critical to ML scoring accuracy, yet how the assessment construct may be associated with machine scoring accuracy remains unknown. This study investigated how the features associated with the assessment construct of a scientific argumentation assessment item affected machine scoring performance. Specifically, we conceptualized the construct in three dimensions: complexity, diversity, and structure. We employed human experts to code characteristics of the assessment tasks and score middle school student responses to 17 argumentation tasks aligned to three levels of a validated learning progression of scientific argumentation. We randomly selected 361 responses to use as training sets to build machine-learning scoring models for each item. The scoring models yielded a range of agreements with human consensus scores, measured by Cohen’s kappa (mean = 0.60; range 0.38 − 0.89), indicating good to almost perfect performance. We found that higher levels of Complexity and Diversity of the assessment task were associated with decreased model performance, similarly the relationship between levels of Structure and model performance showed a somewhat negative linear trend. These findings highlight the importance of considering these construct characteristics when developing ML models for scoring assessments, particularly for higher complexity items and multidimensional assessments.
Funder
Directorate for Education and Human Resources
Publisher
Springer Science and Business Media LLC
Subject
Computational Theory and Mathematics,Education
Reference76 articles.
1. Aggarwal, C. C., & Zhai, C. (2012). Mining text data. Kluwer Academic Publishers.
2. Alonzo, A. C., & Steedle, J. T. (2009). Developing and assessing a force and motion learning progression. Science Education, 93(3), 389–421. https://doi.org/10.1002/sce.20303
3. Anderson, C. W., de los Santos, E. X., Bodbyl, S., Covitt, B. A., Edwards, K. D., Hancock, I. I., Lin, J. B., Thomas, Q. M., Penuel, C., & Welch, M. M. (2018). Designing educational systems to support enactment of the Next Generation Science standards. Journal of Research in Science Teaching, 55(7), 1026–1052. https://doi.org/10.1002/tea.21484
4. Anderson, L. W. (2005). Objectives, evaluation, and the improvement of education. Measurement Evaluation and Statistical Analysis, 31(2), 102–113. https://doi.org/10.1016/j.stueduc.2005.05.004
5. Bennett, R. E., & Ward, W. C. (Eds.). (1993). Construction versus choice in cognitive measurement: Issues in constructed response, performance testing, and portfolio assessment. L. Erlbaum Associates.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献