Enhancing targeted outreach for longitudinal surveys: predictive analytics for participant response in the Millennium Cohort Study

Author:

Barkho Wisam1,Carnes Nathan1,Kolaja Claire1,Tu Xin2,Boparai Satbir1,Castañeda Sheila F.1,Sheppard Beverly D.1,Walstrom Jennifer L.1,Belding Jennifer N.1,Rull Rudolph P.1

Affiliation:

1. Naval Health Research Center

2. University of California San Diego

Abstract

Abstract Background: The Millennium Cohort Study is a prospective cohort study designed to examine the long-term effects of military service. The study collects self-reported data from surveys administered every 3–5 years to military personnel and veterans. Participant nonresponse to follow-up surveys presents a potential threat to the validity and generalizability of study findings. In recent years, predictive analytics has emerged as a promising tool to identify predictors of nonresponse. Methods: Here, we present a method that leverages machine learning techniques to develop a high-skill classifier to predict participant response to Millennium Cohort Study follow-up surveys. Using a temporal cross-validation method, six supervised algorithms, each using differing learning strategies, were employed to predict response to the 2021 follow-up survey. Using latent class analysis (LCA), we classified participants based on historical survey response and compared prediction performance with and without this variable. Feature analysis was subsequently conducted on the best performing model to identify the most important factors. Results: The LCA variable consisted of six distinct classes: consistent web responders, consistent paper responders, mode switchers, early and late consistent nonresponders, and inconsistent responders. Notably, the LCA classes were found to be statistically significant across various military and demographic characteristics. Including the LCA variable in the machine learning analysis, all six algorithms performed comparably to one another. However, without the LCA variable, random forest outperformed the benchmark regression model by 9.3% in area under the curve (AUC) of the receiver operating characteristic (ROC) curve and 7.7% in AUC of precision-recall. Moreover, inclusion of the LCA variable increased the ROC AUC for all algorithms by 10% or more, except for random forest and interaction forest, which improved by approximately 5% in ROC AUC. Feature analysis indicated LCA was the most important predictor in the model. Conclusions: Our findings highlight the importance of historical response patterns to significantly improve prediction performance of participant response to follow-up surveys. Machine learning algorithms can be especially valuable when historical data are not available. Implementing these methods in longitudinal studies can enhance outreach efforts by strategically targeting participants, ultimately boosting survey response rates and mitigating nonresponse.

Publisher

Research Square Platform LLC

Reference29 articles.

1. Rothman KJ, Greenland S, Lash TL. Modern epidemiology. Philadelphia: Wolters Kluwer Health/Lippincott Williams & Wilkins; 2008.

2. Longitudinal studies;Caruana EJ;J Thorac Dis,2015

3. Improving the response rate and quality in Web-based surveys through the personalization and frequency of reminder mailings;Muñoz-Leiva F;Qual Quant,2010

4. Effect of incentives on web-based surveys;Su J;Tsinghua Sci Technol,2008

5. An introduction to machine learning methods for survey researchers;Buskirk T;Surv Pract,2018

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3