Abstract
We describe a multi-factor model of the spread of COVID-19 across the 58 counties of California from March 2020 to June 2023. The model provides estimates of cumulative cases and duration of the epidemic versus 5 independent variables. The independent variables are the following correlated factors: population, population density, family income, Gini coefficient, and land area (size) of each county. The correlation coefficients of these factors are used to reduce the error in models of cumulative cases and duration.
The model produces linear equations – one for cumulative cases and the other for duration of infection. The cumulative case estimate is highly correlated with population, but the estimate is improved by considering all 5 factors. The duration of infection1 estimate is improved by considering population and income level. We also find that infection rate (per capita cases) varies highly and roughly obeys a normal distribution, suggesting randomness, rather than correlation with one or more of the 5 factors.
We also observe the vast differences between high and low cumulative cases across the 58 counties. Using the same model but with different values of correlation of the same factors, we obtain very accurate models of both bottom-half and top-half counties. i.e., counties with 49,000 or fewer cumulative cases versus counties with 50,000 – 3.5 million cases. This suggests that multi-factor models are a suitable alternative to traditional diffusion models especially if an analysis of causal factors is preferred versus estimates of cumulative cases versus time.
An attempt to predict the cumulative cases at the end of the pandemic (1191 days) at one- and two-year intervals reveals the challenge of prediction. The model improves as more data is accumulated but lacks predictive power. We speculate that the model might be extended by noting the convergence rate of successive approximations and then extrapolating. This was left as future work.