Some thoughts on analytical choices in the scaling model for test scores in international large-scale assessment studies-Reference-Cited by-同舟云学术

Some thoughts on analytical choices in the scaling model for test scores in international large-scale assessment studies

Published:2022-09-03 Issue:1 Volume:4 Page:
ISSN:2523-8930
Container-title:Measurement Instruments for the Social Sciences
language:en
Short-container-title:Meas Instrum Soc Sci

Author:

Robitzsch Alexander^ORCID,Lüdtke Oliver

Abstract

AbstractInternational large-scale assessments (LSAs), such as the Programme for International Student Assessment (PISA), provide essential information about the distribution of student proficiencies across a wide range of countries. The repeated assessments of the distributions of these cognitive domains offer policymakers important information for evaluating educational reforms and received considerable attention from the media. Furthermore, the analytical strategies employed in LSAs often define methodological standards for applied researchers in the field. Hence, it is vital to critically reflect on the conceptual foundations of analytical choices in LSA studies. This article discusses the methodological challenges in selecting and specifying the scaling model used to obtain proficiency estimates from the individual student responses in LSA studies. We distinguish design-based inference from model-based inference. It is argued that for the official reporting of LSA results, design-based inference should be preferred because it allows for a clear definition of the target of inference (e.g., country mean achievement) and is less sensitive to specific modeling assumptions. More specifically, we discuss five analytical choices in the specification of the scaling model: (1) specification of the functional form of item response functions, (2) the treatment of local dependencies and multidimensionality, (3) the consideration of test-taking behavior for estimating student ability, and the role of country differential items functioning (DIF) for (4) cross-country comparisons and (5) trend estimation. This article’s primary goal is to stimulate discussion about recently implemented changes and suggested refinements of the scaling models in LSA studies.

Funder

IPN – Leibniz-Institut für die Pädagogik der Naturwissenschaften und Mathematik an der Universität Kiel

Publisher

Leibniz Institute for Psychology (ZPID)

Subject

General Medicine

Link

https://link.springer.com/content/pdf/10.1186/s42409-022-00039-w.pdf

Reference151 articles.

1. Adams, R. J. (2003). Response to ‘Cautions on OECD’s recent educational survey (PISA)’. Oxford Review of Education, 29(3), 379–389. https://doi.org/10.1080/03054980307445.

2. Aitkin, M. & Aitkin, I. (2006). Investigation of the identifiability of the 3PL model in the NAEP 1986 math survey. Technical report. https://bit.ly/35b79X0

3. Berk, R., Brown, L., Buja, A., George, E., Pitkin, E., Zhang, K., & Zhao, L. (2014). Misspecified mean function regression: Making good use of regression models that are wrong. Sociological Methods & Research, 43(3), 422–451. https://doi.org/10.1177/0049124114526375.

4. Binder, D. A., & Roberts, G. R. (2003). Design-based and model-based methods for estimating model parameters. In R. L. Chambers, & C. J. Skinner (Eds.), Analysis of survey data, (pp. 29–48). Wiley. https://doi.org/10.1002/0470867205.ch3.

5. Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord, & M. R. Novick (Eds.), Statistical theories of mental test scores, (pp. 397–479). MIT Press.

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Improving carbon emission efficiency in Chinese manufacturing: A study considering technological heterogeneity and noise;Energy;2024-03

2. To Check or Not to Check? A Comment on the Contemporary Psychometrics (ConPsy) Checklist for the Analysis of Questionnaire Items;European Journal of Investigation in Health, Psychology and Education;2023-10-06

3. Comparing different trend estimation approaches in country means and standard deviations in international large-scale assessment studies;Large-scale Assessments in Education;2023-07-19

4. Regularized Mislevy-Wu Model for Handling Nonignorable Missing Item Responses;Information;2023-06-28

5. Regularized Generalized Logistic Item Response Model;Information;2023-05-26