Affiliation:
1. The Graduate School of Education at Stanford University, Stanford, CA, USA
Abstract
The fit of an item response model is typically conceptualized as whether a given model could have generated the data. In this study, for an alternative view of fit, “predictive fit,” based on the model’s ability to predict new data is advocated. The authors define two prediction tasks: “missing responses prediction”—where the goal is to predict an in-sample person’s response to an in-sample item—and “missing persons prediction”—where the goal is to predict an out-of-sample person’s string of responses. Based on these prediction tasks, two predictive fit metrics are derived for item response models that assess how well an estimated item response model fits the data-generating model. These metrics are based on long-run out-of-sample predictive performance (i.e., if the data-generating model produced infinite amounts of data, what is the quality of a “model’s predictions on average?”). Simulation studies are conducted to identify the prediction-maximizing model across a variety of conditions. For example, defining prediction in terms of missing responses, greater average person ability, and greater item discrimination are all associated with the 3PL model producing relatively worse predictions, and thus lead to greater minimum sample sizes for the 3PL model. In each simulation, the prediction-maximizing model to the model selected by Akaike’s information criterion, Bayesian information criterion (BIC), and likelihood ratio tests are compared. It is found that performance of these methods depends on the prediction task of interest. In general, likelihood ratio tests often select overly flexible models, while BIC selects overly parsimonious models. The authors use Programme for International Student Assessment data to demonstrate how to use cross-validation to directly estimate the predictive fit metrics in practice. The implications for item response model selection in operational settings are discussed.
Funder
Institute of Education Sciences
The Spencer Foundation Grant
Subject
Psychology (miscellaneous),Social Sciences (miscellaneous)
Reference45 articles.
1. A new look at the statistical model identification
2. A goodness of fit test for the rasch model
3. Item Response Theory
4. Bates S., Hastie T., Tibshirani R. (2021). Cross-validation: What does it estimate and how well does it do it? ArXiv Preprint arXiv:2104.00673.
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献