Taking It Easy: Off-the-Shelf Versus Fine-Tuned Supervised Modeling of Performance Appraisal Text
-
Published:2024-08-28
Issue:
Volume:
Page:
-
ISSN:1094-4281
-
Container-title:Organizational Research Methods
-
language:en
-
Short-container-title:Organizational Research Methods
Author:
Speer Andrew B.1ORCID, Perrotta James2, Kordsmeyer Tobias L.3
Affiliation:
1. Department of Management & Entrepreneurship, Kelley School of Business, Indiana University, Bloomington, Indiana, USA 2. Department of Psychology, Wayne State University, Detroit, Michigan, USA 3. Department of Psychology & Leibniz Science Campus Primate Cognition, University of Goettingen, Gottingen, Germany
Abstract
When assessing text, supervised natural language processing (NLP) models have traditionally been used to measure targeted constructs in the organizational sciences. However, these models require significant resources to develop. Emerging “off-the-shelf” large language models (LLM) offer a way to evaluate organizational constructs without building customized models. However, it is unclear whether off-the-shelf LLMs accurately score organizational constructs and what evidence is necessary to infer validity. In this study, we compared the validity of supervised NLP models to off-the-shelf LLM models (ChatGPT-3.5 and ChatGPT-4). Across six organizational datasets and thousands of comments, we found that supervised NLP produced scores were more reliable than human coders. However, and even though not specifically developed for this purpose, we found that off-the-shelf LLMs produce similar psychometric properties as supervised models, though with slightly less favorable psychometric properties. We connect these findings to broader validation considerations and present a decision chart to guide researchers and practitioners on how they can use off-the-shelf LLM models to score targeted constructs, including guidance on how psychometric evidence can be “transported” to new contexts.
Publisher
SAGE Publications
Reference46 articles.
1. Brown T. B., Mann B., Ryder N., Subbiah M., Kaplan J., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell A., Agarwal S., Herbert-Voss A., Krueger G., Henighan T., Child R., Ramesh A., Ziegler D. M., Wu J., Winter C., Hesse C., Chen M., Sigler E., Litwin M., Gray S., Chess B., Clark J., Berner C., McCandlish S., Radford A., Sutskever I. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165. 2. Words versus numbers: A theoretical exploration of giving and receiving narrative comments in performance appraisal 3. Examining the Role of Narrative Performance Appraisal Comments on Performance
|
|