Abstract
ABSTRACTObjectiveEpidemiological research using electronic healthcare records(EHR) informing everyday patient care uses combinations of codes (“codelists”) to define diseases and prescriptions (or phenotypes). Yet methodology for codelist generation varies, manifesting in misclassification bias, while there are drug-specific codelist considerations.Materials and MethodsWe developed methods to generate drug codelists, testing this using the Clinical Practice Research Datalink (CPRD) Aurum database, accounting for missing data in “attribute” search variables. We generated codelists for 1)cardiovascular disease and 2)inhaled Chronic Obstructive Pulmonary Disease (COPD) therapies, applying them to a sample cohort of 335,931 COPD patients. We compared searching on all search variables (A,”gold standard”) to B) chemical and C) ontological information only.ResultsIn Search A we determined 165,150 patients prescribed cardiovascular drugs(49.2% of cohort), and 317,963 prescribed COPD inhalers (94.7% of cohort). Considering output per value set, Search C missed substantial prescriptions, including vasodilator anti-hypertensives (A and B:19,696 prescriptions; C:1,145) and SAMA inhalers (A and B:35,310; C:564).DiscussionWe recommend the full methods (A) for comprehensiveness. There are special considerations when generating adaptable and generalizable drug codelists, including fluctuating status, cohort-specific drug indications, underlying hierarchical ontology, and statistical analyses.ConclusionsMethods must have end-to-end clinical input, and be standardizable, reproducible, and understandable to all researchers across data contexts.LAY ABSTRACTHealth research using patient records informs everyday medicine, using groups of codes (“codelists”) to define diseases and drugs. Yet methods to create drug codelists are inconsistent, may not include physician expertise, nor be reported.We developed a reproducible method to create drug codelists, testing it using de-identified healthcare records. We generated codelists for 1) heart conditions and 2) inhalers to identify prescriptions in a sample group of 335,931 patients with chronic lung disease. We compared our full methods (Search A) to two restricted searches to show prescriptions can be missed if necessary considerations are not made.In search A, we determined 165,150 people (49.2% of sample group) prescribed drugs from the heart codelist. For lung inhalers, we determined 317,963 prescriptions (94.7% of group). Search C missed substantial prescriptions, for drugs lowering blood pressure by opening vessels (A and B:19,696 prescriptions; C: 1,145), and short-term inhalers opening airways (A and B: 35,310; C:564).We recommend full methods(A) for completeness. Drug codelist methods must be consistent, duplicable, and include physician input at all research stages, and have special considerations including status (eg, new, taken off market), disease, and drug categorical system. Quality methods should be freely accessible and usable across study contexts.
Publisher
Cold Spring Harbor Laboratory
Reference44 articles.
1. Subphenotyping depression using machine learning and electronic health records
2. NCATS National COVID Cohort Collaborative (N3C) Data Enclave. COVID-19 Clinical Data Warehouse Data Dictionary: Based on OMOP Common Data Model Specifications Version 5.3. https://ncats.nih.gov/files/OMOP_CDM_COVID.pdf
3. Polypharmacy-associated risk of hospitalisation among people ageing with and without HIV: an observational study
4. WSIC Data Specification, v11. https://www.registerfordiscover.org.uk/uploads/files/1539001703datadictionary.pdf
5. CPRD Aurum Data Specification, v2.8. Published online August 10, 2022. https://cprd.com/sites/default/files/2022-08/CPRD%20Aurum%20Data%20Specification%20v2.8.pdf