Affiliation:
1. Arizona State University, Mesa, AZ, USA
Abstract
Large Language Models (LLMs) with their novel conversational interaction format could create incorrectly calibrated expectations about their capabilities. The present study investigates human expectations toward a generic LLM’s capabilities and limitations. Participants of an online study were shown a series of prompts that cover a wide range of tasks and asked to assess the likelihood of the LLM being able to help with those tasks. The result is a catalog of people’s general expectations of LLM capabilities across various task domains. Depending on the actual capabilities of a specific system, this could inform developers of potential over- or under-reliance on this technology due to these misconceptions. To explore a potential way of correcting misconceptions we also attempted to manipulate their expectations with three different interface designs. In most of the tested task domains, such as computation and text processing, however, these seem to be insufficient to overpower people’s initial expectations.
Reference15 articles.
1. Understanding Large Language Models
2. Measurement Instruments for the Anthropomorphism, Animacy, Likeability, Perceived Intelligence, and Perceived Safety of Robots
3. Bubeck S., Chandrasekaran V., Eldan R., Gehrke J., Horvitz E., Kamar E., Lee P., Lee Y. T., Li Y., Lundberg S., Nori H., Palangi H., Ribeiro M. T., Zhang Y. (2023). Sparks of Artificial General Intelligence: Early experiments with GPT-4. arXiv. http://arxiv.org/abs/2303.12712