Peer review of GPT-4 technical report and systems card

Author:

Gallifant JackORCID,Fiske Amelia,Levites Strekalova Yulia A.,Osorio-Valencia Juan S.,Parke RachaelORCID,Mwavu Rogers,Martinez Nicole,Gichoya Judy Wawira,Ghassemi Marzyeh,Demner-Fushman Dina,McCoy Liam G.,Celi Leo Anthony,Pierce Robin

Abstract

The study provides a comprehensive review of OpenAI’s Generative Pre-trained Transformer 4 (GPT-4) technical report, with an emphasis on applications in high-risk settings like healthcare. A diverse team, including experts in artificial intelligence (AI), natural language processing, public health, law, policy, social science, healthcare research, and bioethics, analyzed the report against established peer review guidelines. The GPT-4 report shows a significant commitment to transparent AI research, particularly in creating a systems card for risk assessment and mitigation. However, it reveals limitations such as restricted access to training data, inadequate confidence and uncertainty estimations, and concerns over privacy and intellectual property rights. Key strengths identified include the considerable time and economic investment in transparent AI research and the creation of a comprehensive systems card. On the other hand, the lack of clarity in training processes and data raises concerns about encoded biases and interests in GPT-4. The report also lacks confidence and uncertainty estimations, crucial in high-risk areas like healthcare, and fails to address potential privacy and intellectual property issues. Furthermore, this study emphasizes the need for diverse, global involvement in developing and evaluating large language models (LLMs) to ensure broad societal benefits and mitigate risks. The paper presents recommendations such as improving data transparency, developing accountability frameworks, establishing confidence standards for LLM outputs in high-risk settings, and enhancing industry research review processes. It concludes that while GPT-4’s report is a step towards open discussions on LLMs, more extensive interdisciplinary reviews are essential for addressing bias, harm, and risk concerns, especially in high-risk domains. The review aims to expand the understanding of LLMs in general and highlights the need for new reflection forms on how LLMs are reviewed, the data required for effective evaluation, and addressing critical issues like bias and risk.

Publisher

Public Library of Science (PLoS)

Reference44 articles.

1. Hu K. ChatGPT sets record for fastest-growing user base—analyst note. Reuters. 2023 Feb 2. https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/ [cited 2023 Apr 3].

2. OpenAI. GPT-4 Technical Report. arXiv; 2023.

3. GPT-4. https://openai.com/product/gpt-4 [cited 2023 Apr 4].

4. ProtGPT2 is a deep unsupervised language model for protein design;N Ferruz;Nat Commun,2022

5. Buntz B. Nvidia launches BioNeMo Cloud to accelerate drug discovery. Drug Discovery and Development [Internet]. 2023 Mar 21 [cited 2023 Mar 31]. https://www.drugdiscoverytrends.com/nvidia-launches-bionemo-cloud-as-a-breakthrough-ai-service-for-drug-discovery-research/.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3