Affiliation:
1. Bioinformatics and Human Electrophysiology Laboratory, Department of Informatics, Ionian University, 49100 Corfu, Greece
2. Department of Mathematics, University of Patras, 26504 Patras, Greece
Abstract
The rapid advancement of data generation techniques has spurred innovation across multiple domains. This comprehensive review delves into the realm of data generation methodologies, with a keen focus on statistical and machine learning-based approaches. Notably, novel strategies like the divide-and-conquer (DC) approach and cutting-edge models such as GANBLR have emerged to tackle a spectrum of challenges, spanning from preserving intricate data relationships to enhancing interpretability. Furthermore, the integration of generative adversarial networks (GANs) has sparked a revolution in data generation across sectors like healthcare, cybersecurity, and retail. This review meticulously examines how these techniques mitigate issues such as class imbalance, data scarcity, and privacy concerns. Through a meticulous analysis of evaluation metrics and diverse applications, it underscores the efficacy and potential of synthetic data in refining predictive models and decision-making software. Concluding with insights into prospective research trajectories and the evolving role of synthetic data in propelling machine learning and data-driven solutions across disciplines, this work provides a holistic understanding of the transformative power of contemporary data generation methodologies.
Reference46 articles.
1. Understanding the impact of big data on firm performance: The necessity of conceptually differentiating among big data characteristics;Ghasemaghaei;Int. J. Inf. Manag.,2021
2. The necessity and case analysis of bigdata quality control in medical institution;Choi;J. Bigdata,2017
3. Tabular and latent space synthetic data generation: A literature review;Fonseca;J. Big Data,2023
4. Federated learning for generating synthetic data: A scoping review;Little;Int. J. Popul. Data Sci.,2023
5. Hahn, W., Schütte, K., Schultz, K., Wolkenhauer, O., Sedlmayr, M., Schuler, U., Eichler, M., Bej, S., and Wolfien, M. (2022). Contribution of Synthetic Data Generation towards an Improved Patient Stratification in Palliative Care. J. Pers. Med., 12.