Abstract
Abstract
Background
Examinees may not make enough effort when responding to test items if the assessment has no consequence for them. These disengaged responses can be problematic in low-stakes, large-scale assessments because they can bias item parameter estimates. However, the amount of bias, and whether this bias is similar across administrations, is unknown. This study compares the degree of disengagement (i.e., fast and non-effortful responses) and the impact of disengagement on item parameter estimates in the Programme for International Student Assessment (PISA) across the 2015 and 2018 administrations.
Method
We detected disengaged responses at the item level based on response times and response behaviors. We used data from the United States and analyzed 51 computer-based mathematics items administered in both PISA 2015 and PISA 2018. We compared the percentage of disengaged responses and the average scores of the disengaged responses for the 51 common items. We filtered disengaged responses at the response- and examinee-levels and compared item difficulty (P+ and b) and item discrimination (a) before and after filtering.
Results
Our findings suggested that there were only slight differences in the amount of disengagement in the U.S. results for PISA 2015 and PISA 2018. In both years, the amount of disengagement was less than 5.2%, and the average scores of disengaged responses were lower than the average scores of engaged responses. We did not find any serious impact of disengagement on item parameter estimates when we applied response-level filtering; however, we found some bias, particularly on item difficulty, when we applied examinee-level filtering.
Conclusions
This study highlights differences in the amount of disengagement in PISA 2015 and PISA 2018 as well as the implications of the decisions made for handling disengaged responses on item difficulty and discrimination. The results of this study provide important information for reporting trends across years.
Publisher
Springer Science and Business Media LLC
Reference30 articles.
1. Bovaird, J., & Embretson, E. (2006, August). Using response time to increase the construct validity of trait estimates. Paper presented at the 114th Annual Meeting of the American Psychological Association, New Orleans, LA.
2. Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.
3. Debeer, D., Bucholz, J., Hartig, J., & Janssen, R. (2014). Student, school, and country differences in sustained test-taking effort in the 2009 PISA Reading Assessment. Journal of Educational and Behavioral Statistics, 39, 502–523.
4. Goldhammer, F., Martens, T., & Lüdtke, O. (2017). Conditioning factors of test-taking engagement in PIAAC: An exploratory IRT modelling approach considering person and item characteristics. Large-Scale Assessments in Education, 5(18), 1–25.
5. Guo, H., Rios, J. A., Haberman, S., Liu, O. L., Wang, J., & Paek, I. (2016). A new procedure for detection of students’ rapid guessing responses using response time. Applied Measurement in Education, 29(3), 173–183.
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献