Abstract
AbstractThe development of Artificial Intelligence (AI) in the healthcare sector is generating a great impact. However, one of the primary challenges for the implementation of this technology is the access to high-quality data due to issues in data collection and regulatory constraints, for which synthetic data is an emerging alternative. This Scoping review analyses reviews from the past 10 years from three different databases (i.e., PubMed, Scopus, and Web of Science) to identify the healthcare domains where synthetic data are currently generated, the motivations behind their creation, their future uses, limitations, and types of data. A total of 13 main domains were identified, with Oncology, Neurology, and Cardiology being the most frequently mentioned. Five types of motivations and three principal future uses were also identified. Furthermore, it was found that the predominant type of data generated is unstructured, particularly images. Finally, several future work directions were suggested, including exploring new domains and less commonly used data types (e.g., video and text), and developing an evaluation benchmark and standard generative models for specific domains.
Publisher
Cold Spring Harbor Laboratory