Validating a forced-choice method for eliciting quality-of-reasoning judgments-Reference-Cited by-同舟云学术

Validating a forced-choice method for eliciting quality-of-reasoning judgments

Published:2023-10-13 Issue: Volume: Page:
ISSN:1554-3528
Container-title:Behavior Research Methods
language:en
Short-container-title:Behav Res

Author:

Marcoci Alexandru,Webb Margaret E.,Rowe Luke,Barnett Ashley,Primoratz Tamar,Kruger Ariel,Karvetski Christopher W.,Stone Benjamin,Diamond Michael L.,Saletta Morgan,van Gelder Tim,Tetlock Philip E.,Dennis Simon

Abstract

AbstractIn this paper we investigate the criterion validity of forced-choice comparisons of the quality of written arguments with normative solutions. Across two studies, novices and experts assessing quality of reasoning through a forced-choice design were both able to choose arguments supporting more accurate solutions—62.2% (SE = 1%) of the time for novices and 74.4% (SE = 1%) for experts—and arguments produced by larger teams—up to 82% of the time for novices and 85% for experts—with high inter-rater reliability, namely 70.58% (95% CI = 1.18) agreement for novices and 80.98% (95% CI = 2.26) for experts. We also explored two methods for increasing efficiency. We found that the number of comparative judgments needed could be substantially reduced with little accuracy loss by leveraging transitivity and producing quality-of-reasoning assessments using an AVL tree method. Moreover, a regression model trained to predict scores based on automatically derived linguistic features of participants’ judgments achieved a high correlation with the objective accuracy scores of the arguments in our dataset. Despite the inherent subjectivity involved in evaluating differing quality of reasoning, the forced-choice paradigm allows even novice raters to perform beyond chance and can provide a valid, reliable, and efficient method for producing quality-of-reasoning assessments at scale.

Funder

Intelligence Advanced Research Projects Activity

Publisher

Springer Science and Business Media LLC

Subject

General Psychology,Psychology (miscellaneous),Arts and Humanities (miscellaneous),Developmental and Educational Psychology,Experimental and Cognitive Psychology

Link

https://link.springer.com/content/pdf/10.3758/s13428-023-02234-x.pdf

Reference43 articles.

1. Adelson-Velsky, G., & Landis, E. (1962). An algorithm for the organization of information. Proceedings of the USSR Academy of Sciences, 146, 263–266. in Russian.

2. Bramley, T., Bell, J. F., & Pollitt, A. (1998). Assessing changes in standards over time using Thurstone paired comparisons. Education Research and Perspectives, 25, 1–24.

3. Brookhart, S. M., & Chen, F. (2015). The quality and effectiveness of descriptive rubrics. Educational Review, 67(3), 343–368.

4. Burgman, M. A. (2016). Trusting judgements: How to get the best out of experts. Cambridge University Press.

5. Burgman, M. A., McBride, M., Ashton, R., Speirs-Bridge, A., Flander, L., et al. (2011). Expert status and performance. PLOS ONE, 6(7), e22998. https://doi.org/10.1371/journal.pone.0022998