Author:
Angel Mirana,Xing Haiyi,Patel Anuj,Alachkar Amal,Baldi Pierre
Abstract
AbstractBackgroundThere has been considerable recent effort in integrating Large Language Models (LLMs) across different fields, including healthcare. However, the possibility of applying LLMs in pharmacy-related sciences is under-explored.ObjectivesThis study aims to evaluate the capabilities and limitations of six LLMs–GPT-3.5, GPT-4, Llama-2-7B, Llama-2-13B, Llama-2-70B, and Mistral-7B, in the field of pharmacy by assessing their reasoning abilities on a sample of the North American Pharmacist Licensure Examination (NAPLEX). Additionally, we explore the potential impacts of LLMs on pharmacy education and practice.MethodsTo evaluate the LLMs, we utilized the sample of the NAPLEX exam comprising 225 multiple-choice questions sourced from the APhA Complete Review for Pharmacy, 13th Edition | Pharmacy Library. These questions were presented to the Large Language Models through either local deployment or the Application programming interface (API), and the answers generated by the LLMs were subsequently compared with the answer key.ResultsThere is a notable disparity in the performance of the LLMs. GPT-4 emerged as the top performer, accurately answering 87.1% of the questions. Among the six LLMs evaluated, GPT-4 was the only model capable of passing the NAPLEX exam.ConclusionWe examined the performance of Large Language Models based on their model size, training methods, and fine-tuning algorithms. Given the continuous evolution of LLMs, it is reasonable to anticipate that future models will effortlessly excel in exams such as the NAPLEX. This highlights the significant potential of LLMs to influence the field of pharmacy. Hence, we must evaluate both the positive and negative implications associated with the integration of LLMs in pharmacy education and practice.
Publisher
Cold Spring Harbor Laboratory