Do Large Language Models Show Human-like Biases? Exploring Confidence—Competence Gap in AI-Reference-Cited by-同舟云学术

Do Large Language Models Show Human-like Biases? Exploring Confidence—Competence Gap in AI

Published:2024-02-06 Issue:2 Volume:15 Page:92
ISSN:2078-2489
Container-title:Information
language:en
Short-container-title:Information

Author:

Singh Aniket Kumar¹^ORCID,Lamichhane Bishal²^ORCID,Devkota Suman³^ORCID,Dhakal Uttam³^ORCID,Dhakal Chandra⁴^ORCID

Affiliation:

1. Department of Computing and Information Systems, Youngstown State University, Youngstown, OH 44555, USA

2. Department of Mathematics and Statistics, University of Nevada, Reno, NV 89557, USA

3. Department of Electrical and Computer Engineering, Youngstown State University, Youngstown, OH 44555, USA

4. Formerly with the Department of Agricultural and Applied Economics, University of Georgia, Athens, GA 30602, USA

Abstract

This study investigates self-assessment tendencies in Large Language Models (LLMs), examining if patterns resemble human cognitive biases like the Dunning–Kruger effect. LLMs, including GPT, BARD, Claude, and LLaMA, are evaluated using confidence scores on reasoning tasks. The models provide self-assessed confidence levels before and after responding to different questions. The results show cases where high confidence does not correlate with correctness, suggesting overconfidence. Conversely, low confidence despite accurate responses indicates potential underestimation. The confidence scores vary across problem categories and difficulties, reducing confidence for complex queries. GPT-4 displays consistent confidence, while LLaMA and Claude demonstrate more variations. Some of these patterns resemble the Dunning–Kruger effect, where incompetence leads to inflated self-evaluations. While not conclusively evident, these observations parallel this phenomenon and provide a foundation to further explore the alignment of competence and confidence in LLMs. As LLMs continue to expand their societal roles, further research into their self-assessment mechanisms is warranted to fully understand their capabilities and limitations.

Publisher

MDPI AG

Link

https://www.mdpi.com/2078-2489/15/2/92/pdf

Reference25 articles.

1. Vaswani, A., Shazeer, N.M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., and Polosukhin, I. (2017, January 4–9). Attention is All you Need. Proceedings of the NIPS 2017, Long Beach, CA, USA.

2. Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z.Y., Tang, J., Chen, X., and Lin, Y. (2023). A Survey on Large Language Model based Autonomous Agents. arXiv.

3. Zhuang, Y., Liu, Q., Ning, Y., Huang, W., Lv, R., Huang, Z., Zhao, G., Zhang, Z., Mao, Q., and Wang, S. (2023). Efficiently Measuring the Cognitive Ability of LLMs: An Adaptive Testing Perspective. arXiv.

4. Probing the psychology of AI models;Shiffrin;Proc. Natl. Acad. Sci. USA,2023

5. tse Huang, J., Lam, M.H.A., Li, E., Ren, S., Wang, W., Jiao, W., Tu, Z., and Lyu, M.R. (2023). Emotionally Numb or Empathetic? Evaluating How LLMs Feel Using EmotionBench. arXiv.