Abstract
AbstractAn impressive number of COVID-19 data catalogs exist. None, however, are optimized for data science applications, e.g., inconsistent naming and data conventions, uneven quality control, and lack of alignment between disease data and potential predictors pose barriers to robust modeling and analysis. To address this gap, we generated a unified dataset that integrates and implements quality checks of the data from numerous leading sources of COVID-19 epidemiological and environmental data. We use a globally consistent hierarchy of administrative units to facilitate analysis within and across countries. The dataset applies this unified hierarchy to align COVID-19 case data with a number of other data types relevant to understanding and predicting COVID-19 risk, including hydrometeorological data, air quality, information on COVID-19 control policies, and key demographic characteristics.
Publisher
Cold Spring Harbor Laboratory
Reference39 articles.
1. An interactive web-based dashboard to track COVID-19 in real time
2. The Atlantic Monthly Group. The COVID tracking project. https://covidtracking.com/ (2020).
3. NYC Department of Health and Mental Hygiene. GitHub - nychealth/coronavirus-data. https://github.com/nychealth/coronavirus-data.
4. The New York Times. GitHub - nytimes/covid-19-data: An ongoing repository of data on coronavirus cases and deaths in the U.S. https://github.com/nytimes/covid-19-data (2021).
5. Monitoring the number of COVID-19 cases and deaths in Brazil at municipal and federative units level