Exploration of Machine Learning and Statistical Techniques in Development of a Low-Cost Screening Method Featuring the Global Diet Quality Score for Detecting Prediabetes in Rural India

Author:

Birk Nick12,Matsuzaki Mika34ORCID,Fung Teresa T5ORCID,Li Yanping3ORCID,Batis Carolina6,Stampfer Meir J378,Deitchler Megan9,Willett Walter C378ORCID,Fawzi Wafaie W10ORCID,Bromage Sabri3,Kinra Sanjay2,Bhupathiraju Shilpa N38,Lake Erin1

Affiliation:

1. Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, MA, USA

2. Department of Non-Communicable Disease Epidemiology, London School of Hygiene and Tropical Medicine, University of London, London, United Kingdom

3. Department of Nutrition, Harvard TH Chan School of Public Health, Boston, MA, USA

4. Department of International Health, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA

5. Nutrition Department, Simmons University, Boston, MA, USA

6. CONACYT—Health and Nutrition Research Center, National Institute of Public Health, Cuernavaca, Mexico

7. Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA, USA

8. Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, USA

9. Intake—Center for Dietary Assessment, FHI Solutions, Washington, DC, USA

10. Department of Global Health and Population, Harvard TH Chan School of Public Health, Boston, MA, USA

Abstract

ABSTRACT Background The prevalence of type 2 diabetes has increased substantially in India over the past 3 decades. Undiagnosed diabetes presents a public health challenge, especially in rural areas, where access to laboratory testing for diagnosis may not be readily available. Objectives The present work explores the use of several machine learning and statistical methods in the development of a predictive tool to screen for prediabetes using survey data from an FFQ to compute the Global Diet Quality Score (GDQS). Methods The outcome variable prediabetes status (yes/no) used throughout this study was determined based upon a fasting blood glucose measurement ≥100 mg/dL. The algorithms utilized included the generalized linear model (GLM), random forest, least absolute shrinkage and selection operator (LASSO), elastic net (EN), and generalized linear mixed model (GLMM) with family unit as a (cluster) random (intercept) effect to account for intrafamily correlation. Model performance was assessed on held-out test data, and comparisons made with respect to area under the receiver operating characteristic curve (AUC), sensitivity, and specificity. Results The GLMM, GLM, LASSO, and random forest modeling techniques each performed quite well (AUCs >0.70) and included the GDQS food groups and age, among other predictors. The fully adjusted GLMM, which included a random intercept for family unit, achieved slightly superior results (AUC of 0.72) in classifying the prediabetes outcome in these cluster-correlated data. Conclusions The models presented in the current work show promise in identifying individuals at risk of developing diabetes, although further studies are necessary to assess other potentially impactful predictors, as well as the consistency and generalizability of model performance. In addition, future studies to examine the utility of the GDQS in screening for other noncommunicable diseases are recommended.

Funder

Bill & Melinda Gates Foundation

Intake – Center for Dietary Assessment

Wellcome Trust

Publisher

Oxford University Press (OUP)

Subject

Nutrition and Dietetics,Medicine (miscellaneous)

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3