Abstract
ABSTRACTIntroductionTraditional surveillance methods have been enhanced by the emergence of online participatory syndromic surveillance systems that collect health-related digital data. These systems have many applications including tracking weekly prevalence of Influenza-Like Illness (ILI), predicting probable infection of Coronavirus 2019 (COVID-19), and determining risk factors of ILI and COVID-19. However, not every volunteer consistently completes surveys. In this study, we assess how different missing data methods affect estimates of ILI burden using data from FluTracking, a participatory surveillance system in Australia.MethodsWe estimate the incidence rate, the incidence proportion, and weekly prevalence using five missing data methods: available case, complete case, assume missing is non-ILI, multiple imputation (MI), and delta (δ) MI, which is a flexible and transparent method to impute missing data under Missing Not at Random (MNAR) assumptions. We evaluate these methods using simulated and FluTracking data.ResultsOur simulations show that the optimal missing data method depends on the measure of ILI burden and the underlying missingness model. Of note, the δ-MI method provides estimates of ILI burden that are similar to the true parameter under MNAR models. When we apply these methods to FluTracking, we find that the δ-MI method accurately predicted complete, end of season weekly prevalence estimates from real-time data.ConclusionMissing data is an important problem in participatory surveillance systems. Here, we show that accounting for missingness using statistical approaches leads to different inferences from the data.
Publisher
Cold Spring Harbor Laboratory