Affiliation:
1. Sogang University
2. Goethe University Frankfurt
3. Educational Testing Service
4. Independent Researcher
Abstract
AbstractMany international large‐scale assessments (ILSAs) have switched to multistage adaptive testing (MST) designs to improve measurement efficiency in measuring the skills of the heterogeneous populations around the world. In this context, previous literature has reported the acceptable level of model parameter recovery under the MST designs when the current item response theory (IRT)‐based scaling models are used. However, previous studies have not considered the influence of realistic phenomena commonly observed in ILSA data, such as item‐by‐country interactions, repeated use of MST designs in subsequent cycles, and nonresponse, including omitted and not‐reached items. The purpose of this study is to examine the robustness of current IRT‐based scaling models to these three factors under MST designs, using the Programme for International Student Assessment (PISA) designs as an example. A series of simulation studies show that the IRT scaling models used in the PISA are robust to repeated use of the MST design in a subsequent cycle with fewer items and smaller sample sizes, while item‐by‐country interactions and items not‐reached have negligible to modest effects on model parameter estimation, and omitted responses have the largest effect. The discussion section provides recommendations and implications for future MST designs and scaling models for ILSAs.
Reference47 articles.
1. Altering the Level of Difficulty in Computer Adaptive Testing
2. Multiple Group IRT
3. Buchholz J. Shin H. J. &Bolsinova M.(2023).Test engagement and multistage adaptive testing. Paper presented at the annual meeting of the National Council on Measurement in Education (NCME) Chicago IL.