Abstract
Background
Guaranteeing durability, provenance, accessibility, and trust in open data sets can be challenging for researchers and organizations that rely on public repositories of data critical for epidemiology and other health analytics. The required data repositories are often difficult to locate and may require conversion to a standard data format. Data-hosting websites may also change or become unavailable without warning. A single change to the rules in one repository can hinder updating a public dashboard reliant on data pulled from external sources. These concerns are particularly challenging at the international level, because policies on systems aimed at harmonizing health and related data are typically dictated by national governments to serve their individual needs.
Objective
In this paper, we introduce a comprehensive public health data platform, EpiGraphHub, that aims to provide a single interoperable repository for open health and related data.
Methods
The platform, curated by the international research community, allows secure local integration of sensitive data while facilitating the development of data-driven applications and reports for decision-makers. Its main components include centrally managed databases with fine-grained access control to data, fully automated and documented data collection and transformation, and a powerful web-based data exploration and visualization tool.
Results
EpiGraphHub is already being used for hosting a growing collection of open data sets and for automating epidemiological analyses based on them. The project has also released an open-source software library with the analytical methods used in the platform.
Conclusions
The platform is fully open source and open to external users. It is in active development with the goal of maximizing its value for large-scale public health studies.