GPT-4 passes the bar exam-Reference-Cited by-同舟云学术

GPT-4 passes the bar exam

Published:2024-02-26 Issue:2270 Volume:382 Page:
ISSN:1364-503X
Container-title:Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
language:en
Short-container-title:Phil. Trans. R. Soc. A.

Author:

Katz Daniel Martin¹²³⁴^ORCID,Bommarito Michael James¹²³⁴,Gao Shang⁵,Arredondo Pablo²⁵

Affiliation:

1. Illinois Tech, Chicago Kent College of Law, Chicago, IL, USA

2. CodeX, The Stanford Center for Legal Informatics, Stanford, CA, USA

3. Bucerius Law School, Hamburg, Germany

4. 273 Ventures, LLC, USA

5. Casetext, Inc., USA

Abstract

In this paper, we experimentally evaluate the zero-shot performance of GPT-4 against prior generations of GPT on the entire uniform bar examination (UBE), including not only the multiple-choice multistate bar examination (MBE), but also the open-ended multistate essay exam (MEE) and multistate performance test (MPT) components. On the MBE, GPT-4 significantly outperforms both human test-takers and prior models, demonstrating a 26% increase over ChatGPT and beating humans in five of seven subject areas. On the MEE and MPT, which have not previously been evaluated by scholars, GPT-4 scores an average of 4.2/6.0 when compared with much lower scores for ChatGPT. Graded across the UBE components, in the manner in which a human test-taker would be, GPT-4 scores approximately 297 points, significantly in excess of the passing threshold for all UBE jurisdictions. These findings document not just the rapid and remarkable advance of large language model performance generally, but also the potential for such models to support the delivery of legal services in society. This article is part of the theme issue ‘A complexity science approach to law and governance’.

Publisher

The Royal Society

Link

https://royalsocietypublishing.org/doi/pdf/10.1098/rsta.2023.0254

Reference86 articles.

1. Measuring Law Over Time: A Network Analytical Framework with an Application to Statutes and Regulations in the United States and Germany

2. Chalkidis I Jana A Hartung D Bommarito M Androutsopoulos I Katz D Aletras N. 2022 LexGLUE: a benchmark dataset for legal language understanding in english. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics pp. 4310–4330.

3. Complexity and Entropy in Legal Language

4. Measuring the complexity of the law: the United States Code

5. Law’s complexity: a primer;Ruhl JB;Ga. St. UL Rev.,2007

Cited by 41 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. ChatGPT compared to national guidelines for management of ovarian cancer: Did ChatGPT get it right? – A Memorial Sloan Kettering Cancer Center Team Ovary study;Gynecologic Oncology;2024-10

2. ChatGPTest: Opportunities and Cautionary Tales of Utilizing AI for Questionnaire Pretesting;Field Methods;2024-09-12

3. Putting GPT-4o to the Sword: A Comprehensive Evaluation of Language, Vision, Speech, and Multimodal Proficiency;Applied Sciences;2024-09-03

4. BB-GeoGPT: A framework for learning a large language model for geographic information science;Information Processing & Management;2024-09

5. Large language models for automatic equation discovery of nonlinear dynamics;Physics of Fluids;2024-09-01