GPT VS. HUMAN FOR SCIENTIFIC REVIEWS: A DUAL SOURCE REVIEW ON APPLICATIONS OF CHATGPT IN SCIENCE

Author:

Wu Chenxi,Varghese Alan John,Oommen Vivek,Karniadakis George Em

Abstract

The new polymath large language models (LLMs) can greatly speed up scientific reviews, possibly using more unbiased quantitative metrics, facilitating cross-disciplinary connections, and identifying emerging trends and research gaps by analyzing large volumes of data. However, at the present time, they lack the required deep understanding of complex methodologies, they have difficulty in evaluating innovative claims, and they are unable to assess ethical issues and conflicts of interest. Herein, we consider 13 geotechnical parrot tales (GPT)-related papers across different scientific domains, reviewed by a human reviewer and SciSpace, a large language model, with the reviews evaluated by three distinct types of evaluators, namely GPT-3.5, a crowd panel, and GPT-4. We found that 50% of SciSpace's responses to objective questions align with those of a human reviewer, with GPT-4 (informed evaluator) often rating the human reviewer higher in accuracy, and SciSpace higher in structure, clarity, and completeness. In subjective questions, the uninformed evaluators (GPT-3.5 and crowd panel) showed varying preferences between SciSpace and human responses, with the crowd panel showing a preference for the human responses. However, GPT-4 rated them equally in accuracy and structure but favored SciSpace for completeness.

Publisher

Begell House

Reference74 articles.

1. Adiguzel, T., Kaya, M.H., and Cansu, F.K., Revolutionizing Education with AI: Exploring the Transformative Potential of ChatGPT, Contemp. Ed. Technol., vol. 15, no. 3, Article ID ep429, 2023.

2. AI4Science Microsoft Research and Quantum Microsoft Azure, The Impact of Large Language Models on Scientific Discovery: A Preliminary Study Using GPT-4, arXiv preprint arXiv:2311.07361, 2023.

3. Alsagheer, D., Karanjai, R., Diallo, N., Shi, W., Lu, Y., Beydoun, S., and Zhang, Q., Comparing Rationality between Large Language Models and Humans: Insights and Open Questions, arXiv preprint arXiv:2403.09798, 2024.

4. Anil, R., Dai, A.M., Firat, O., Johnson, M., Lepikhin, D., Passos, A., Shakeri, S., Taropa, E., Bailey, P., and Chen, Z., Palm 2 Technical Report, arXiv preprint arXiv:2305.10403, 2023.

5. Baidoo-Anu, D. and Ansah, L.O., Education in the Era of Generative Artificial Intelligence (AI): Understanding the Potential Benefits of ChatGPT in Promoting Teaching and Learning, J. AI, vol. 7, no. 1, pp. 52-62, 2023.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3