PVS-GEN: Systematic Approach for Universal Synthetic Data Generation Involving Parameterization, Verification, and Segmentation
Author:
Kim Kyung-Min1ORCID, Kwak Jong Wook1ORCID
Affiliation:
1. Department of Computer Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea
Abstract
Synthetic data generation addresses the challenges of obtaining extensive empirical datasets, offering benefits such as cost-effectiveness, time efficiency, and robust model development. Nonetheless, synthetic data-generation methodologies still encounter significant difficulties, including a lack of standardized metrics for modeling different data types and comparing generated results. This study introduces PVS-GEN, an automated, general-purpose process for synthetic data generation and verification. The PVS-GEN method parameterizes time-series data with minimal human intervention and verifies model construction using a specific metric derived from extracted parameters. For complex data, the process iteratively segments the empirical dataset until an extracted parameter can reproduce synthetic data that reflects the empirical characteristics, irrespective of the sensor data type. Moreover, we introduce the PoR metric to quantify the quality of the generated data by evaluating its time-series characteristics. Consequently, the proposed method can automatically generate diverse time-series data that covers a wide range of sensor types. We compared PVS-GEN with existing synthetic data-generation methodologies, and PVS-GEN demonstrated a superior performance. It generated data with a similarity of up to 37.1% across multiple data types and by 19.6% on average using the proposed metric, irrespective of the data type.
Funder
National Research Foundation of Korea
Subject
Electrical and Electronic Engineering,Biochemistry,Instrumentation,Atomic and Molecular Physics, and Optics,Analytical Chemistry
Reference61 articles.
1. On the usability of synthetic data for improving the robustness of deep learning-based segmentation of cardiac magnetic resonance images;Amirrajab;Med. Image Anal.,2023 2. Luotsinen, L.J., Kamrani, F., Lundmark, L., Sabel, J., Stiff, H., and Sandström, V. (2021). Deep Learning with Limited Data: A Synthetic Approach, Totalförsvarets Forskningsinstitut. 3. Lu, Y., Wang, H., and Wei, W. (2023). Machine Learning for Synthetic Data Generation: A Review. arXiv. 4. Pérez-Porras, F.J., Triviño-Tarradas, P., Cima-Rodríguez, C., Meroño-de Larriva, J.E., García-Ferrer, A., and Mesas-Carrascosa, F.J. (2021). Machine learning methods and synthetic data generation to predict large wildfires. Sensors, 21. 5. Liu, F., and Panagiotakos, D. (2022). Real-world data: A brief review of the methods, applications, challenges and opportunities. BMC Med. Res. Methodol., 22.
|
|