Fluent but Not Factual: A Comparative Analysis of ChatGPT and Other AI Chatbots’ Proficiency and Originality in Scientific Writing for Humanities-Reference-Cited by-同舟云学术

Fluent but Not Factual: A Comparative Analysis of ChatGPT and Other AI Chatbots’ Proficiency and Originality in Scientific Writing for Humanities

Published:2023-10-13 Issue:10 Volume:15 Page:336
ISSN:1999-5903
Container-title:Future Internet
language:en
Short-container-title:Future Internet

Author:

Lozić Edisa¹^ORCID,Štular Benjamin¹^ORCID

Affiliation:

1. Research Centre of the Slovenian Academy of Sciences and Arts, 1000 Ljubljana, Slovenia

Abstract

Historically, mastery of writing was deemed essential to human progress. However, recent advances in generative AI have marked an inflection point in this narrative, including for scientific writing. This article provides a comprehensive analysis of the capabilities and limitations of six AI chatbots in scholarly writing in the humanities and archaeology. The methodology was based on tagging AI-generated content for quantitative accuracy and qualitative precision by human experts. Quantitative accuracy assessed the factual correctness in a manner similar to grading students, while qualitative precision gauged the scientific contribution similar to reviewing a scientific article. In the quantitative test, ChatGPT-4 scored near the passing grade (−5) whereas ChatGPT-3.5 (−18), Bing (−21) and Bard (−31) were not far behind. Claude 2 (−75) and Aria (−80) scored much lower. In the qualitative test, all AI chatbots, but especially ChatGPT-4, demonstrated proficiency in recombining existing knowledge, but all failed to generate original scientific content. As a side note, our results suggest that with ChatGPT-4, the size of large language models has reached a plateau. Furthermore, this paper underscores the intricate and recursive nature of human research. This process of transforming raw data into refined knowledge is computationally irreducible, highlighting the challenges AI chatbots face in emulating human originality in scientific writing. Our results apply to the state of affairs in the third quarter of 2023. In conclusion, while large language models have revolutionised content generation, their ability to produce original scientific contributions in the humanities remains limited. We expect this to change in the near future as current large language model-based AI chatbots evolve into large language model-powered software.

Funder

European Union’s Horizon Europe research and innovation programme

Slovenian Research and Innovation Agency

Publisher

MDPI AG

Subject

Computer Networks and Communications

Link

https://www.mdpi.com/1999-5903/15/10/336/pdf

Reference115 articles.

1. Li, F.-F., Russ, A., Langlotz, C., Ganguli, S., Landay, J., Michele, E., Ho, D.E., Liangs, P., Brynjolfsson, E., and Manning, C.D. (2023). Generative AI: Perspectives from Stanford HAI. How Do You Think Generative AI Will Affect Your Field and Society Going Forward?, HAI, Stanford University, Human-Centred Artificial Inteligence.

2. Li, F.-F., Russ, A., Langlotz, C., Ganguli, S., Landay, J., Michele, E., Ho, D.E., Liangs, P., Brynjolfsson, E., and Manning, C.D. (2023). Generative AI: Perspectives from Stanford HAI. How Do You Think Generative AI Will Affect Your Field and Society Going Forward?, HAI, Stanford University, Human-Centred Artificial Inteligence.

3. Li, F.-F., Russ, A., Langlotz, C., Ganguli, S., Landay, J., Michele, E., Ho, D.E., Liangs, P., Brynjolfsson, E., and Manning, C.D. (2023). Generative AI: Perspectives from Stanford HAI. How Do You Think Generative AI Will Affect Your Field and Society Going Forward?, HAI, Stanford University, Human-Centred Artificial Inteligence.

4. Eloundou, T., Manning, S., Mishkin, P., and Rock, D. (2023). GPTs Are GPTs: An Early Look at the Labor Market Impact Potential of 5. Large Language Models. arXiv.

5. Bommasani, R., Hudson, D.A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M.S., Bohg, J., Bosselut, A., and Brunskill, E. (2023). On the Opportunities and Risks of Foundation Models, Center for Research on Foundation Models, Stanford University.

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Structured Narrative Prompt for Prompting Narratives from Large Language Models: Sentiment Assessment of ChatGPT-Generated Narratives and Real Tweets;Future Internet;2023-11-23