Predictive Fit Metrics for Item Response Models-Reference-Cited by-同舟云学术

Predictive Fit Metrics for Item Response Models

Published:2022-02-13 Issue:2 Volume:46 Page:136-155
ISSN:0146-6216
Container-title:Applied Psychological Measurement
language:en
Short-container-title:Applied Psychological Measurement

Author:

Stenhaug Benjamin A.¹^ORCID,Domingue Benjamin W.¹

Affiliation:

1. The Graduate School of Education at Stanford University, Stanford, CA, USA

Abstract

The fit of an item response model is typically conceptualized as whether a given model could have generated the data. In this study, for an alternative view of fit, “predictive fit,” based on the model’s ability to predict new data is advocated. The authors define two prediction tasks: “missing responses prediction”—where the goal is to predict an in-sample person’s response to an in-sample item—and “missing persons prediction”—where the goal is to predict an out-of-sample person’s string of responses. Based on these prediction tasks, two predictive fit metrics are derived for item response models that assess how well an estimated item response model fits the data-generating model. These metrics are based on long-run out-of-sample predictive performance (i.e., if the data-generating model produced infinite amounts of data, what is the quality of a “model’s predictions on average?”). Simulation studies are conducted to identify the prediction-maximizing model across a variety of conditions. For example, defining prediction in terms of missing responses, greater average person ability, and greater item discrimination are all associated with the 3PL model producing relatively worse predictions, and thus lead to greater minimum sample sizes for the 3PL model. In each simulation, the prediction-maximizing model to the model selected by Akaike’s information criterion, Bayesian information criterion (BIC), and likelihood ratio tests are compared. It is found that performance of these methods depends on the prediction task of interest. In general, likelihood ratio tests often select overly flexible models, while BIC selects overly parsimonious models. The authors use Programme for International Student Assessment data to demonstrate how to use cross-validation to directly estimate the predictive fit metrics in practice. The implications for item response model selection in operational settings are discussed.

Funder

Institute of Education Sciences

The Spencer Foundation Grant

Publisher

SAGE Publications

Subject

Psychology (miscellaneous),Social Sciences (miscellaneous)

Link

http://journals.sagepub.com/doi/pdf/10.1177/01466216211066603

Reference45 articles.

1. A new look at the statistical model identification

2. A goodness of fit test for the rasch model

3. Item Response Theory

4. Bates S., Hastie T., Tibshirani R. (2021). Cross-validation: What does it estimate and how well does it do it? ArXiv Preprint arXiv:2104.00673.

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The InterModel Vigorish as a Lens for Understanding (and Quantifying) the Value of Item Response Models for Dichotomously Coded Items;Psychometrika;2024-06-03

2. Challenging the illusion of objectivity: an in-depth analysis of the preselected items evaluation (PIE) method in translation evaluation;Journal of Applied Research in Higher Education;2024-06-03

3. A Comparative Study of Item Response Theory Models for Mixed Discrete-Continuous Responses;Journal of Intelligence;2024-02-25