Abstract
IntroductionPopulation-level and individual-level analyses have strengths and limitations as do ‘blackbox’ machine learning (ML) and traditional, interpretable models. Diabetes mellitus (DM) is a leading cause of morbidity and mortality with complex sociodemographic dynamics that have not been analyzed in a way that leverages population-level and individual-level data as well as traditional epidemiological and ML models. We analyzed complementary individual-level and county-level datasets with both regression and ML methods to study the association between sociodemographic factors and DM.Research design and methodsCounty-level DM prevalence, demographics, and socioeconomic status (SES) factors were extracted from the 2018 Robert Wood Johnson Foundation County Health Rankings and merged with US Census data. Analogous individual-level data were extracted from 2007 to 2016 National Health and Nutrition Examination Survey studies and corrected for oversampling with survey weights. We used multivariate linear (logistic) regression and ML regression (classification) models for county (individual) data. Regression and ML models were compared using measures of explained variation (area under the receiver operating characteristic curve (AUC) and R2).ResultsAmong the 3138 counties assessed, the mean DM prevalence was 11.4% (range: 3.0%–21.1%). Among the 12 824 individuals assessed, 1688 met DM criteria (13.2% unweighted; 10.2% weighted). Age, gender, race/ethnicity, income, and education were associated with DM at the county and individual levels. Higher county Hispanic ethnic density was negatively associated with county DM prevalence, while Hispanic ethnicity was positively associated with individual DM. ML outperformed regression in both datasets (mean R2 of 0.679 vs 0.610, respectively (p<0.001) for county-level data; mean AUC of 0.737 vs 0.727 (p<0.0427) for individual-level data).ConclusionsHispanic individuals are at higher risk of DM, while counties with larger Hispanic populations have lower DM prevalence. Analyses of population-level and individual-level data with multiple methods may afford more confidence in results and identify areas for further study.
Subject
Endocrinology, Diabetes and Metabolism
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献