Affiliation:
1. National Institute of Informatics, Japan
Abstract
Large Language Models (LLMs) face limitations in logical reasoning, which restrict their applicability in critical domains such as law. Current evaluation methods often lead to inaccurate assessments of LLMs’ capabilities due to their simplicity. This paper presents a refined evaluation method for assessing LLMs’ capability to answer legal questions by eliminating the possibility of obtaining correct responses by chance. Furthermore, we introduce the LogiLaw dataset, which aims to enhance the models’ logical reasoning capacities in general and legal reasoning specifically. By leveraging the refined evaluation technique, the LogiLaw dataset, and the proposed Reinforcement Learning from Logical Feedback (RLLF) approach, our work aims to open new avenues for research to bolster LLMs’ performance in law and other logic-intensive disciplines while addressing the shortcomings of conventional evaluation approaches.