BACKGROUND
A primary goal of precision medicine is to identify patient subgroups and infer their underlying disease processes, with the aim of designing targeted interventions. However, few methods automatically identify both patient subgroups and their co-occurring characteristics simultaneously, measure their significance, and visualize the results. Such methods could enhance the interpretability of patient subgroups, and inform the design of classification and predictive models.
OBJECTIVE
To analyze patient subgroups in hospital readmitted patients using a three-step modeling approach. (1) Visual analytical modeling to automatically identify patient subgroups and their co-occurring comorbidities, and determine their statistical significance and clinical interpretability. (2) Classification modeling to classify patients into subgroups and measure its accuracy. (3) Prediction modeling to predict a patient’s risk of readmission and compare its accuracy with and without patient subgroup information.
METHODS
We extracted 2013-2014 Medicare data related to hospital readmission in three conditions: chronic obstructive pulmonary disease (COPD), congestive heart failure (CHF), and total hip/knee arthroplasty (THA/TKA). For each condition, we extracted cases defined as patients readmitted within 30 days of hospital discharge, and controls defined as patients not readmitted within 90 days of discharge, matched by age, gender, race, and Medicaid eligibility (n[COPD]=29,016, n[CHF]=51,550, n[THA/TKA]=16,498). These data were analyzed using: (1) bipartite networks to identify patient subgroups based on frequently co-occurring high-risk comorbidities; (2) multinomial logistic regression to classify patients into subgroups; and (3) hierarchical logistic regression to predict the risk of hospital readmission using subgroup membership, compared to standard logistic regression without subgroup membership.
RESULTS
In each condition, the visual analytical model identified patient subgroups that were statistically significant (Q=0.17, 0.17, 0.31; P<.001, <.001, <.05), were significantly replicated (RI=0.92, 0.94, 0.89; P<.001, <.001, <.01), and were clinically meaningful to clinicians. (2) In each condition, the classification model had high accuracy in classifying patients into subgroups (mean accuracy=99.60%, 99.34%, 99.86%). (3) In two conditions (COPD, THA/TKA), the hierarchical prediction model had a small but statistically significant improvement in discriminating between the readmitted and not readmitted patients as measured by net reclassification improvement (NRI=.059, .11), but not as measured by the C-statistic or integrated discrimination improvement (IDI).
CONCLUSIONS
While the visual analytical models identified statistically and clinically significant patient subgroups, the results pinpoint the need to analyze subgroups at different levels of granularity for improving the interpretability of intra- and inter-cluster associations. The high accuracy of the classification models reflects the strong separation of the patient subgroups despite the size and density of the datasets. Finally, the small improvement in predictive accuracy suggests that comorbidities alone were not strong predictors for hospital readmission, and the need for more sophisticated subgroup modeling methods. Such advances could improve the interpretability and predictive accuracy of patient subgroup models for reducing the risk of hospital readmission and beyond.