Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other large language models in scholarly peer review-Reference-Cited by-同舟云学术

Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other large language models in scholarly peer review

Published:2023-05-18 Issue:1 Volume:8 Page:
ISSN:2058-8615
Container-title:Research Integrity and Peer Review
language:en
Short-container-title:Res Integr Peer Rev

Author:

Hosseini Mohammad^ORCID,Horbach Serge P. J. M.^ORCID

Abstract

Abstract Background The emergence of systems based on large language models (LLMs) such as OpenAI’s ChatGPT has created a range of discussions in scholarly circles. Since LLMs generate grammatically correct and mostly relevant (yet sometimes outright wrong, irrelevant or biased) outputs in response to provided prompts, using them in various writing tasks including writing peer review reports could result in improved productivity. Given the significance of peer reviews in the existing scholarly publication landscape, exploring challenges and opportunities of using LLMs in peer review seems urgent. After the generation of the first scholarly outputs with LLMs, we anticipate that peer review reports too would be generated with the help of these systems. However, there are currently no guidelines on how these systems should be used in review tasks. Methods To investigate the potential impact of using LLMs on the peer review process, we used five core themes within discussions about peer review suggested by Tennant and Ross-Hellauer. These include 1) reviewers’ role, 2) editors’ role, 3) functions and quality of peer reviews, 4) reproducibility, and 5) the social and epistemic functions of peer reviews. We provide a small-scale exploration of ChatGPT’s performance regarding identified issues. Results LLMs have the potential to substantially alter the role of both peer reviewers and editors. Through supporting both actors in efficiently writing constructive reports or decision letters, LLMs can facilitate higher quality review and address issues of review shortage. However, the fundamental opacity of LLMs’ training data, inner workings, data handling, and development processes raise concerns about potential biases, confidentiality and the reproducibility of review reports. Additionally, as editorial work has a prominent function in defining and shaping epistemic communities, as well as negotiating normative frameworks within such communities, partly outsourcing this work to LLMs might have unforeseen consequences for social and epistemic relations within academia. Regarding performance, we identified major enhancements in a short period and expect LLMs to continue developing. Conclusions We believe that LLMs are likely to have a profound impact on academia and scholarly communication. While potentially beneficial to the scholarly communication system, many uncertainties remain and their use is not without risks. In particular, concerns about the amplification of existing biases and inequalities in access to appropriate infrastructure warrant further attention. For the moment, we recommend that if LLMs are used to write scholarly reviews and decision letters, reviewers and editors should disclose their use and accept full responsibility for data security and confidentiality, and their reports’ accuracy, tone, reasoning and originality.

Funder

National Center for Advancing Translational Sciences

Publisher

Springer Science and Business Media LLC

Subject

General Earth and Planetary Sciences,General Environmental Science

Link

https://link.springer.com/content/pdf/10.1186/s41073-023-00133-5.pdf

Reference32 articles.

1. Blanco-Gonzalez A, Cabezon A, Seco-Gonzalez A, Conde-Torres D, Antelo-Riveiro P, Pineiro A, et al. The Role of AI in Drug Discovery: Challenges, Opportunities, and Strategies. arXiv; 2022 [cited 2022 Dec 27]. Available from: http://arxiv.org/abs/2212.08104

2. Gao CA, Howard FM, Markov NS, Dyer EC, Ramesh S, Luo Y, et al. Comparing scientific abstracts generated by ChatGPT to original abstracts using an artificial intelligence output detector, plagiarism detector, and blinded human reviewers. bioRxiv; 2022 [cited 2023 Jan 31]. p. 2022.12.23.521610. Available from: https://www.biorxiv.org/content/10.1101/2022.12.23.521610v1

3. Schulz R, Barnett A, Bernard R, Brown NJL, Byrne JA, Eckmann P, et al. Is the future of peer review automated? BMC Res Notes. 2022;15(1):203.

4. Weissgerber T, Riedel N, Kilicoglu H, Labbé C, Eckmann P, ter Riet G, et al. Automated screening of COVID-19 preprints: can we help authors to improve transparency and reproducibility? Nat Med. 2021;27(1):6–7.

5. Tennant JP, Ross-Hellauer T. The limitations to our understanding of peer review. Res Integr Peer Rev. 2020;5(1):6.

Cited by 79 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Integrating generative AI in management education: A mixed-methods study using social construction of technology theory;The International Journal of Management Education;2024-11

2. Investigating questionable research practices among Iranian applied linguists: Prevalence, severity, and the role of artificial intelligence tools;System;2024-10

3. Revolutionizing EFL writing: unveiling the strategic use of ChatGPT by Indonesian master’s students;Cogent Education;2024-09-06

4. Generative AI-assisted Peer Review in Medical Publications: Opportunities Or Trap (Preprint);2024-09-02

5. A comparative genre analysis of AI-generated and scholar-written abstracts for English review articles in international journals;Journal of English for Academic Purposes;2024-09