Taking It Easy: Off-the-Shelf Versus Fine-Tuned Supervised Modeling of Performance Appraisal Text-Reference-Cited by-同舟云学术

Taking It Easy: Off-the-Shelf Versus Fine-Tuned Supervised Modeling of Performance Appraisal Text

Published:2024-08-28 Issue: Volume: Page:
ISSN:1094-4281
Container-title:Organizational Research Methods
language:en
Short-container-title:Organizational Research Methods

Author:

Speer Andrew B.¹^ORCID,Perrotta James²,Kordsmeyer Tobias L.³

Affiliation:

1. Department of Management & Entrepreneurship, Kelley School of Business, Indiana University, Bloomington, Indiana, USA

2. Department of Psychology, Wayne State University, Detroit, Michigan, USA

3. Department of Psychology & Leibniz Science Campus Primate Cognition, University of Goettingen, Gottingen, Germany

Abstract

When assessing text, supervised natural language processing (NLP) models have traditionally been used to measure targeted constructs in the organizational sciences. However, these models require significant resources to develop. Emerging “off-the-shelf” large language models (LLM) offer a way to evaluate organizational constructs without building customized models. However, it is unclear whether off-the-shelf LLMs accurately score organizational constructs and what evidence is necessary to infer validity. In this study, we compared the validity of supervised NLP models to off-the-shelf LLM models (ChatGPT-3.5 and ChatGPT-4). Across six organizational datasets and thousands of comments, we found that supervised NLP produced scores were more reliable than human coders. However, and even though not specifically developed for this purpose, we found that off-the-shelf LLMs produce similar psychometric properties as supervised models, though with slightly less favorable psychometric properties. We connect these findings to broader validation considerations and present a decision chart to guide researchers and practitioners on how they can use off-the-shelf LLM models to score targeted constructs, including guidance on how psychometric evidence can be “transported” to new contexts.

Publisher

SAGE Publications

Link

https://journals.sagepub.com/doi/pdf/10.1177/10944281241271249

Reference46 articles.

1. Brown T. B., Mann B., Ryder N., Subbiah M., Kaplan J., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell A., Agarwal S., Herbert-Voss A., Krueger G., Henighan T., Child R., Ramesh A., Ziegler D. M., Wu J., Winter C., Hesse C., Chen M., Sigler E., Litwin M., Gray S., Chess B., Clark J., Berner C., McCandlish S., Radford A., Sutskever I. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.

2. Words versus numbers: A theoretical exploration of giving and receiving narrative comments in performance appraisal

3. Examining the Role of Narrative Performance Appraisal Comments on Performance