Language models, like humans, show content effects on reasoning tasks-Reference-Cited by-同舟云学术

Language models, like humans, show content effects on reasoning tasks

Published:2024-06-28 Issue:7 Volume:3 Page:
ISSN:2752-6542
Container-title:PNAS Nexus
language:en
Short-container-title:

Author:

Lampinen Andrew K¹^ORCID,Dasgupta Ishita¹,Chan Stephanie C Y¹^ORCID,Sheahan Hannah R²,Creswell Antonia²,Kumaran Dharshan²^ORCID,McClelland James L¹³^ORCID,Hill Felix²

Affiliation:

1. Google DeepMind , Mountain View, CA, 94043 USA

2. Google DeepMind , London N1C 4DN , UK

3. Stanford University , Stanford, CA 94306 , USA

Abstract

Abstract Abstract reasoning is a key ability for an intelligent system. Large language models (LMs) achieve above-chance performance on abstract reasoning tasks but exhibit many imperfections. However, human abstract reasoning is also imperfect. Human reasoning is affected by our real-world knowledge and beliefs, and shows notable “content effects”; humans reason more reliably when the semantic content of a problem supports the correct logical inferences. These content-entangled reasoning patterns are central to debates about the fundamental nature of human intelligence. Here, we investigate whether language models—whose prior expectations capture some aspects of human knowledge—similarly mix content into their answers to logic problems. We explored this question across three logical reasoning tasks: natural language inference, judging the logical validity of syllogisms, and the Wason selection task. We evaluate state of the art LMs, as well as humans, and find that the LMs reflect many of the same qualitative human patterns on these tasks—like humans, models answer more accurately when the semantic content of a task supports the logical inferences. These parallels are reflected in accuracy patterns, and in some lower-level features like the relationship between LM confidence over possible answers and human response times. However, in some cases the humans and models behave differently—particularly on the Wason task, where humans perform much worse than large models, and exhibit a distinct error pattern. Our findings have implications for understanding possible contributors to these human cognitive effects, as well as the factors that influence language model performance.

Publisher

Oxford University Press (OUP)

Link

https://academic.oup.com/pnasnexus/article-pdf/3/7/pgae233/58651606/pgae233.pdf

Reference85 articles.

1. Connectionism and cognitive architecture: a critical analysis;Fodor;Cognition,1988

2. Physical symbol systems;Newell;Cogn Sci,1980

3. Abstraction and analogy-making in artificial intelligence;Mitchell;Ann N Y Acad Sci,2021

4. Deep learning needs a prefrontal cortex;Russin;Work Bridging AI Cogn Sci,2020

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Multimodal Large Language Model Passes Specialty Board Examination and Surpasses Human Test-Taker Scores: A Comparative Analysis Examining the Stepwise Impact of Model Prompting Strategies on Performance;2024-07-29

2. (Ir)rationality and cognitive biases in large language models;Royal Society Open Science;2024-06

3. Philosophy of cognitive science in the age of deep learning;WIREs Cognitive Science;2024-05-21