Affiliation:
1. University of Wisconsin - Milwaukee, USA,
Abstract
This article investigates how measurement models and statistical procedures can be applied to estimate the accuracy of proficiency classification in language testing. The paper starts with a concise introduction of four measurement models: the classical test theory (CTT) model, the dichotomous item response theory (IRT) model, the testlet response theory (TRT) model, and the polytomous item response theory (Poly-IRT) model. Following this, two classification procedures are presented: the Livingston and Lewis method for CTT and the Rudner method for the three IRT-based models. The utility of these models and procedures are then evaluated by examining the accuracy of classifying 5000 language test takers from a large-scale language certification examination into two proficiency categories. The most important finding is that the testlet format (multiple questions based on one prompt), which language tests usually rely on, has a great impact on the proficiency classification. All testlets in this study show a strong testlet effect. Hence, the TRT model is recommended for proficiency classification. Using the standard IRT model would inflate the classification accuracy due to the underestimated measurement error. Meanwhile, using the Poly-IRT model would give slightly less accurate classification results. Concerning the CTT model, while its classification accuracy is comparable to that of the TRT, there exists considerable inconsistency between their classification results.
Subject
Linguistics and Language,Social Sciences (miscellaneous),Language and Linguistics
Cited by
27 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献