Abstract
AbstractUnderstanding the subtle confluence of factors triggering pan-continental, seasonal epidemics of influenza-like illness is an extremely important problem, with the potential to save tens of thousands of lives and billions of dollars every year in the US alone. Beginning with several large, longitudinal datasets on putative factors and clinical data on the disease and health status of over 150 million human subjects observed over a decade, we investigated the source and the mechanistic triggers of epidemics. Our analysis included insurance claims for a significant cross-section of the US population in the past decade, human movement patterns inferred from billions of tweets, whole-US weekly weather data covering the same time span as the medical records, data on vaccination coverage over the same period, and sequence variations of key viral proteins. We also explicitly accounted for the spatio-temporal auto-correlations of infectious waves, and a host of socioeconomic and demographic factors. We carried out multiple orthogonal statistical analyses on these diverse, large geo-temporal datasets to bolster and corroborate our findings. We conclude that the initiation of a pan-continental influenza wave emerges from the simultaneous realization of a complex set of conditions, the strongest predictor groups are as follows, ranked by importance: (1) the host population’s socio- and ethno-demographic properties; (2) weather variables pertaining to relevant area specific humidity, temperature, and solar radiation; (3) the virus’ antigenic drift over time; (4) the host population’s land-based travel habits, and; (5) the spatio-temporal dynamics’ immediate history, as reflected in the influenza wave autocorrelation. The models we infer are demonstrably predictive (area under the Receiver Operating Characteristic curve ≈ 80%) when tested with out-of-sample data, opening the door to the potential formulation of new population-level intervention and mitigation policies.
Publisher
Cold Spring Harbor Laboratory