Affiliation:
1. University of Pennsylvania
Abstract
A multiscale criterion-referenced test that featured two presumably equivalent forms (A and B), was administered to 1,667 Head Start children at each of four points over an academic year. Using a randomly equivalent groups design, three equating methods were applied: common-item IRT equating using concurrent calibration, linear transformation, and equipercentile transformation. The methods were compared by examining mean score differences, weighted mean squared difference, and Kolmogorov's D statistics for each subscale. The results indicated that over time the IRT equating method and conventional equating methods exhibited different patterns of discrepancy between the two test forms. IRT equating yielded marginally smaller form-to-form mean score differences and generated slightly f ewer distributional discrepancies between Forms A and B than both linear and equipercentile equating. However, the results were mixed indicating that more studies are needed to provide additional information on the relative merits and weaknesses of each approach.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献