BACKGROUND
Reducing lapses in care for people living with HIV (PLWH) is critical to ending the HIV epidemic and beneficial for the individual health of PLWH. Predictive modeling can identify clinical factors that are associated with lapses in HIV care. Previous studies have identified these factors within a single clinic or using a national network of clinics, but public health strategies to improve retention in care in the U.S. often occur within a regional jurisdiction (e.g., city or county).
OBJECTIVE
We sought to build predictive models of lapses in HIV care using a large, multi-site, non-curated database of electronic health records (EHRs) in Chicago.
METHODS
We used data between 2011 to 2019 from the Chicago Area Patient-Centered Outcomes Research Network (CAPriCORN), a database that includes 11 health systems containing 12.8 million patients, covering the majority of 23,580 diagnosed PLWH in Chicago. CAPriCORN uses a hash-based data deduplication method to follow people across multiple Chicago healthcare systems with different EHRs, providing a unique city-wide view on retention in care for PLWH. From the database, we utilized diagnosis codes, medications, laboratory tests, demographics, and encounter information to build predictive models. Our primary outcome was lapses in HIV care, which we defined as having more than 12 months between subsequent HIV care encounters. We built logistic regression, random forest, elastic net logistic regression, and XGBoost models using all variables and compared their performance to a baseline logistic regression model containing only demographics and retention history.
RESULTS
We included PLWH with at least two HIV care encounters in the database, resulting in 16,930 PLWH with a total of 191,492 encounters. All models outperformed the baseline logistic regression model, with the most improvement from the XGBoost model (AUC 0.776 [0.768 - 0.784] vs 0.674 [0.664-0.683], P<.001). Top predictors included history of lapses in care, being seen by an Infectious Disease provider (vs. primary care provider), site of care, Hispanic ethnicity, and previous laboratory testing for HIV. The random forest model (AUC 0.751 [0.742-0.759]) revealed age, insurance type, and chronic comorbidities, such as hypertension, as important variables in predicting a lapse in care.
CONCLUSIONS
We used a real-world approach to leverage the full scope of data available in modern EHRs to predict lapses in HIV care. Our findings reinforce previously known factors, such as history of prior lapses in care, while also showing the importance of laboratory testing, chronic comorbidities, sociodemographic characteristics, and clinic-specific factors for predicting lapses in care for PLWH in Chicago. We provide a framework for others to use data from multiple different healthcare systems within a single city to examine lapses in care using EHR data, which will aid in jurisdictional efforts to improve retention in HIV care.
Keywords: Human Immunodeficiency Virus; predictive model; lapse in care; retention in care; people living with HIV; Chicago; HIV care continuum; electronic health record
CLINICALTRIAL