Affiliation:
1. Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, Indiana, USA; email: fliu2@nd.edu
Abstract
In the healthcare landscape, data science (DS) methods have emerged as indispensable tools to harness real-world data (RWD) from various data sources such as electronic health records, claim and registry data, and data gathered from digital health technologies. Real-world evidence (RWE) generated from RWD empowers researchers, clinicians, and policymakers with a more comprehensive understanding of real-world patient outcomes. Nevertheless, persistent challenges in RWD (e.g., messiness, voluminousness, heterogeneity, multimodality) and a growing awareness of the need for trustworthy and reliable RWE demand innovative, robust, and valid DS methods for analyzing RWD. In this article, I review some common current DS methods for extracting RWE and valuable insights from complex and diverse RWD. This article encompasses the entire RWE-generation pipeline, from study design with RWD to data preprocessing, exploratory analysis, methods for analyzing RWD, and trustworthiness and reliability guarantees, along with data ethics considerations and open-source tools. This review, tailored for an audience that may not be experts in DS, aspires to offer a systematic review of DS methods and assists readers in selecting suitable DS methods and enhancing the process of RWE generation for addressing their specific challenges.
Reference149 articles.
1. Real-world evidence;US Food and Drug Administration,2023
2. FDA approves abatacept for prophylaxis of acute graft versus host disease;US Food and Drug Administration,2021
3. Real-world data: a brief review of the methods, applications, challenges and opportunities;BMC Med. Res. Methodol,2022
4. From data mining to knowledge discovery in databases;AI Mag,1996
5. Uniqueness of medical data mining;Artif. Intel. Med.,2002