A Mathematical Investigation of Hallucination and Creativity in GPT Models-Reference-Cited by-同舟云学术

A Mathematical Investigation of Hallucination and Creativity in GPT Models

Published:2023-05-16 Issue:10 Volume:11 Page:2320
ISSN:2227-7390
Container-title:Mathematics
language:en
Short-container-title:Mathematics

Author:

Lee Minhyeok¹^ORCID

Affiliation:

1. School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea

Abstract

In this paper, we present a comprehensive mathematical analysis of the hallucination phenomenon in generative pretrained transformer (GPT) models. We rigorously define and measure hallucination and creativity using concepts from probability theory and information theory. By introducing a parametric family of GPT models, we characterize the trade-off between hallucination and creativity and identify an optimal balance that maximizes model performance across various tasks. Our work offers a novel mathematical framework for understanding the origins and implications of hallucination in GPT models and paves the way for future research and development in the field of large language models (LLMs).

Funder

Generative Artificial Intelligence System Inc.

Publisher

MDPI AG

Subject

General Mathematics,Engineering (miscellaneous),Computer Science (miscellaneous)

Link

https://www.mdpi.com/2227-7390/11/10/2320/pdf

Reference33 articles.

1. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E.H., Le, Q.V., and Zhou, D. (December, January 28). Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. Proceedings of the Advances in Neural Information Processing Systems, New Orleans, LA, USA.

2. Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., Roberts, A., Brown, T.B., Song, D., and Erlingsson, U. (2021, January 11–13). Extracting Training Data from Large Language Models. Proceedings of the USENIX Security Symposium, San Diego, CA, USA.

3. Shoeybi, M., Patwary, M., Puri, R., LeGresley, P., Casper, J., and Catanzaro, B. (2019). Megatron-lm: Training multi-billion parameter language models using model parallelism. arXiv.

4. Memorization without overfitting: Analyzing the training dynamics of large language models;Tirumala;Adv. Neural Inf. Process. Syst.,2022

5. Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., and Chen, W. (2021). Lora: Low-rank adaptation of large language models. arXiv.

Cited by 35 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An Autonomous Intelligent Liability Determination Method for Minor Accidents Based on Collision Detection and Large Language Models;Applied Sciences;2024-09-01

2. Infusing Artificial Intelligence Into Software Engineering and the DevSecOps Continuum;Computer;2024-09

3. The Promethean Dilemma of AI at the Intersection of Hallucination and Creativity;Communications of the ACM;2024-08-29

4. ChatGPT as Research Scientist: Probing GPT’s capabilities as a Research Librarian, Research Ethicist, Data Generator, and Data Predictor;Proceedings of the National Academy of Sciences;2024-08-20

5. Classifying Unstructured Text in Electronic Health Records for Mental Health Prediction Models using a Large Language Model (Preprint);2024-08-15