Affiliation:
1. Business School, University of Mannheim
2. GESIS–Leibniz Institute for the Social Sciences
3. Department of Society, Technology and Human Factors, RWTH Aachen University
4. Complexity Science Hub Vienna, Vienna, Austria
Abstract
We illustrate how standard psychometric inventories originally designed for assessing noncognitive human traits can be repurposed as diagnostic tools to evaluate analogous traits in large language models (LLMs). We start from the assumption that LLMs, inadvertently yet inevitably, acquire psychological traits (metaphorically speaking) from the vast text corpora on which they are trained. Such corpora contain sediments of the personalities, values, beliefs, and biases of the countless human authors of these texts, which LLMs learn through a complex training process. The traits that LLMs acquire in such a way can potentially influence their behavior, that is, their outputs in downstream tasks and applications in which they are employed, which in turn may have real-world consequences for individuals and social groups. By eliciting LLMs’ responses to language-based psychometric inventories, we can bring their traits to light. Psychometric profiling enables researchers to study and compare LLMs in terms of noncognitive characteristics, thereby providing a window into the personalities, values, beliefs, and biases these models exhibit (or mimic). We discuss the history of similar ideas and outline possible psychometric approaches for LLMs. We demonstrate one promising approach, zero-shot classification, for several LLMs and psychometric inventories. We conclude by highlighting open challenges and future avenues of research for AI Psychometrics.
Reference93 articles.
1. Large language models associate Muslims with violence
2. Adiwardana D., Luong M.T., So D. R., Hall J., Fiedel N., Thoppilan R., Yang Z., Kulshreshtha A., Nemade G., Lu Y., Le Q. V. (2020). Towards a human-like open-domain chatbot. ArXiv. http://arxiv.org/abs/2001.09977
3. Out of One, Many: Using Language Models to Simulate Human Samples
4. Using cognitive psychology to understand GPT-3
5. Stereotyping Norwegian Salmon: An Inventory of Pitfalls in Fairness Benchmark Datasets
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献