BACKGROUND
Pharmacoepidemiologic studies require Anatomic Therapeutic Chemical (ATC) drug classification from real-world data sources. These studies enable standardized analysis of drug utilization patterns and safety monitoring, ultimately promoting rational drug use and improving health outcomes. Proprietary tools for this purpose are expensive while free tools lack generalizability. Large language models (LLMs), like GPT-4o, offer a cost-effective alternative as they can produce explanations about a drug’s ATC code and return the output in a structured fashion.
OBJECTIVE
This paper seeks to establish LLMs as an assisting technology in the drug classification task, a prerequisite to good pharmacoepidemiologic research. This requires developing AI prompts and data processing procedures and showing that the resulting accuracy, efficiency and effectiveness is as good or better than established methods.
METHODS
Patients residing in the US and Canada with medication scheduled through a smart medication dispenser called “spencer SmartHub” (Spencer Health Solutions, Inc., Morrisville, NC) were included in this study if they had a scheduled medication refill in 2024 and consented to the use of their data for research. An AI prompt requesting best and next-best 2nd level ATC codes from de-identified daily-dose strings was generated iteratively with expert guidance on clinical research, digital medicine, and regulatory affairs. An initial prompt was created that ensured aspirin at various doses would be classified as either an analgesic or antithrombotic. Upon success, the prompt was used in a pilot sample of 20 daily dose strings and graded by the expert. While there was more than one incorrect response, the prompt was revised. The prompt was then applied to an inference sample of n=200 daily dose strings, taken without replacement. Finite population inference on the proportions of correct and approximately correct ATC drug classification was carried out. All errors made by the algorithm were reviewed.
RESULTS
There were 3,371 de-identified patients who met the inclusion criteria, 2908 (86%) residing in Canada and 463 (14%) residing in the United States. This resulted in 12,294 daily dose strings. The initial prompt with few-shot learning and concise output was unable to distinguish between aspirin’s analgesic vs antithrombotic therapeutic uses. A revised prompt using chain-of-thought reasoning succeeded and achieved 100% correctness on the pilot sample of n=20. In the inferential sample, a proportion of 0.96 (80% CI 0.943-0.978), were deemed correct by the expert, with the approximately correct designation never being used. The top mistakes were incorrectly classifying dietary supplements as medications, mistaking the identity of a drug, and incorrectly following delimiter instructions.
CONCLUSIONS
GPT-4o offers an accurate, efficient and effective drug classification approach to augment real-world drug databases with ATC drug classes, giving all research teams access to a powerful tool to satisfy a key prerequisite of pharmacoepidemiologic analysis using real-world data from across the globe.