UNSTRUCTURED
The volume of digital data in healthcare is continually growing. In addition to being used in healthcare, the health data collected can also be used for secondary purposes, such as research. In this context, Clinical Data Warehouses (CDW) provide the infrastructure and organization needed to improve the secondary use of health data. Various data models have been proposed for organizing data in a CDW, including the i2b2 model, whose persistence is based on a relational database that can present performance problems when executing queries on massive data. In this article, we evaluate the technical feasibility and performance of an i2b2 implementation with the NoSQL database system Elasticsearch using the Bordeaux University Hospital CDW, which includes data on 2.5 million patients and over 3 billion observations. We propose adaptations of the i2b2 model to take into account the specific features of Elasticsearch. We demonstrate that an Elasticsearch implementation is feasible, with a significant improvement in query performance and for disk space used for storage. This implementation is currently used in production at Bordeaux University Hospital.