GPT VS. HUMAN FOR SCIENTIFIC REVIEWS: A DUAL SOURCE REVIEW ON APPLICATIONS OF CHATGPT IN SCIENCE
-
Published:2024
Issue:2
Volume:5
Page:1-44
-
ISSN:2689-3967
-
Container-title:Journal of Machine Learning for Modeling and Computing
-
language:en
-
Short-container-title:J Mach Learn Model Comput
Author:
Wu Chenxi,Varghese Alan John,Oommen Vivek,Karniadakis George Em
Abstract
The new polymath large language models (LLMs) can greatly speed up scientific reviews, possibly using more unbiased quantitative metrics, facilitating cross-disciplinary connections, and identifying emerging trends and research gaps by analyzing large volumes of data. However, at the present time, they lack the required deep understanding of complex methodologies, they have difficulty in evaluating innovative claims, and they are unable to assess ethical issues and conflicts of interest. Herein, we consider 13 geotechnical parrot tales (GPT)-related papers across different scientific domains, reviewed by a human reviewer and SciSpace, a large language model, with the reviews evaluated by three distinct types of evaluators, namely GPT-3.5, a crowd panel, and GPT-4. We found that 50% of SciSpace's responses to objective questions align with those of a human reviewer,
with GPT-4 (informed evaluator) often rating the human reviewer higher in accuracy, and SciSpace higher in structure, clarity, and completeness. In subjective questions, the uninformed evaluators (GPT-3.5 and crowd panel) showed varying preferences between SciSpace and human responses, with the crowd panel showing a preference for the human responses. However, GPT-4 rated them equally in accuracy and structure but favored SciSpace for completeness.
Reference74 articles.
1. Adiguzel, T., Kaya, M.H., and Cansu, F.K., Revolutionizing Education with AI: Exploring the Transformative Potential of ChatGPT, Contemp. Ed. Technol., vol. 15, no. 3, Article ID ep429, 2023. 2. AI4Science Microsoft Research and Quantum Microsoft Azure, The Impact of Large Language Models on Scientific Discovery: A Preliminary Study Using GPT-4, arXiv preprint arXiv:2311.07361, 2023. 3. Alsagheer, D., Karanjai, R., Diallo, N., Shi, W., Lu, Y., Beydoun, S., and Zhang, Q., Comparing Rationality between Large Language Models and Humans: Insights and Open Questions, arXiv preprint arXiv:2403.09798, 2024. 4. Anil, R., Dai, A.M., Firat, O., Johnson, M., Lepikhin, D., Passos, A., Shakeri, S., Taropa, E., Bailey, P., and Chen, Z., Palm 2 Technical Report, arXiv preprint arXiv:2305.10403, 2023. 5. Baidoo-Anu, D. and Ansah, L.O., Education in the Era of Generative Artificial Intelligence (AI): Understanding the Potential Benefits of ChatGPT in Promoting Teaching and Learning, J. AI, vol. 7, no. 1, pp. 52-62, 2023.
|
|