GANs in the Panorama of Synthetic Data Generation Methods-Reference-Cited by-同舟云学术

GANs in the Panorama of Synthetic Data Generation Methods

Published:2024-04-10 Issue: Volume: Page:
ISSN:1551-6857
Container-title:ACM Transactions on Multimedia Computing, Communications, and Applications
language:en
Short-container-title:ACM Trans. Multimedia Comput. Commun. Appl.

Author:

Vaz Bruno¹^ORCID,Figueira Álvaro²^ORCID

Affiliation:

1. Faculty of Sciences, University of Porto, Porto, Portugal

2. Computer Science, University of Porto, Porto, Portugal

Abstract

This paper focuses on the creation and evaluation of synthetic data to address the challenges of imbalanced datasets in machine learning applications (ML), using fake news detection as a case study. We conducted a thorough literature review on generative adversarial networks (GANs) for tabular data, synthetic data generation methods, and synthetic data quality assessment. By augmenting a public news dataset with synthetic data generated by different GAN architectures, we demonstrate the potential of synthetic data to improve ML models’ performance in fake news detection. Our results show a significant improvement in classification performance, especially in the underrepresented class. We also modify and extend a data usage approach to evaluate the quality of synthetic data and investigate the relationship between synthetic data quality and data augmentation performance in classification tasks. We found a positive correlation between synthetic data quality and performance in the underrepresented class, highlighting the importance of high-quality synthetic data for effective data augmentation.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3657294

Reference83 articles.

1. Abowd, J. M., & Vilhuber, L. (2011). Synthetic establishment microdata around the world. Statistical Journal of the IAOS, 28(1-2), 59-68.

2. MFC-GAN: Class-imbalanced dataset classification using Multiple Fake Class Generative Adversarial Network

3. Adamu Ali-Gombe, Eyad Elyan, Yann Savoye, and Chrisina Jayne. Few-shot classifier gan. In 2018 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2018.

4. Ahmed M. Alaa, Boris van Breugel, Evgeny Saveliev, and Mihaela van der Schaar. How faithful is your synthetic data? sample-level metrics for evaluating and auditing generative models, 2021.

5. Akash Srivastava, Lazar Valkov, Chris Russell, Michael U Gutmann, and Charles Sutton. Veegan: Reducing mode collapse in gans using implicit variational learning. Advances in neural information processing systems, 30, 2017.