Scoring with the computer: Alternative procedures for improving the reliability of holistic essay scoring-Reference-Cited by-同舟云学术

Scoring with the computer: Alternative procedures for improving the reliability of holistic essay scoring

Published:2012-10-06 Issue:1 Volume:30 Page:125-141
ISSN:0265-5322
Container-title:Language Testing
language:en
Short-container-title:Language Testing

Author:

Attali Yigal¹,Lewis Will¹,Steier Michael¹

Affiliation:

1. Educational Testing Service, USA

Abstract

Automated essay scoring can produce reliable scores that are highly correlated with human scores, but is limited in its evaluation of content and other higher-order aspects of writing. The increased use of automated essay scoring in high-stakes testing underscores the need for human scoring that is focused on higher-order aspects of writing. This study experimentally evaluated several alternative procedures for eliciting distinct human scores and improving their reliability. Essays written in response to the argument and issue tasks of the Analytical Writing measure of the GRE General Test were scored by experienced raters under different conditions. Criteria for evaluation included inter-rater agreement, agreement with machine scores, and cross-task reliability. First, the use of a modified scoring rubric that focused on higher-order writing skills increased the reliability for one type of task but decreased it for another. Second, scoring in batches of similar length essays did not have any effect on scores. Third, scoring with available automated essay scores increased reliability of human scores, but also increased their similarity with automated scores. Finally, the use of a more refined 18-point scoring scale significantly increased reliability.

Publisher

SAGE Publications

Subject

Linguistics and Language,Social Sciences (miscellaneous),Language and Linguistics

Link

http://journals.sagepub.com/doi/pdf/10.1177/0265532212452396

Reference32 articles.

1. CONSTRUCT VALIDITY OF E-RATER® IN SCORING TOEFL® ESSAYS

2. A DIFFERENTIAL WORD USE MEASURE FOR CONTENT ANALYSIS IN AUTOMATED ESSAY SCORING

3. Sequential Effects in Essay Ratings

Cited by 36 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Enhancing Rubric Development in Science Education through Topic Modeling Techniques;The Journal of Experimental Education;2024-03-26

2. Automated Assessment of Students' Critical Writing Skills with Chatgpt;2024

3. Assessing second-language academic writing: AI vs. Human raters;Journal of Educational Technology and Online Learning;2023-12-31

4. Using natural language processing to increase prediction and reduce subgroup differences in personnel selection decisions.;Journal of Applied Psychology;2023-10-19

5. Exploring the potential of using an AI language model for automated essay scoring;Research Methods in Applied Linguistics;2023-08