Abstract
Psychometric properties of perceptual assessments, like reliability, depend on the stochastic properties of psychophysical sampling procedures resulting in method variability, as well as inter- and intra-subject variability. Method variability is commonly minimized by optimizing sampling procedures through computer simulations. Inter-subject variability is inherent in the population of interest and cannot be acted upon. In contrast, intra-subject variability introduced by confounds cannot be simply quantified from experimental data, as this data also includes method variability. Therefore, this aspect is generally neglected when developing assessments. Yet, comparing method variability and intra-subject variability could give insights on whether effort should be invested in optimizing the sampling procedure, or in addressing potential confounds instead. We propose a new approach to estimate and model intra-subject variability of psychometric functions by combining computer simulations and behavioral data, and to account for it when simulating experiments. The approach was illustrated in a real-world scenario of proprioceptive difference threshold assessments. The behavioral study revealed a test-retest reliability of 0.212. Computer simulations lacking intra-subject variability predicted a reliability of 0.768, whereas the new approach including an intra-subject variability model lead to a realistic estimate of reliability (0.207). Such a model also allows computing the theoretically maximally attainable reliability (0.552) assuming an ideal sampling procedure. Comparing the reliability estimates when exclusively accounting for method variability versus intra-subject variability reveals that intra-subject variability should be reduced by addressing confounds and that only optimizing the sampling procedure may be insufficient to achieve a high reliability. The new approach also allows accelerating the development of assessments by simulating the converging behavior of the reliability confidence interval with a large number of subjects and retests without requiring additional experiments. Having such a tool of predictive value is especially important for target populations where time is scarce, such as for assessments in clinical settings.
Publisher
Cold Spring Harbor Laboratory