Increasing prediction accuracy of pathogenic staging by sample augmentation with a GAN-Reference-Cited by-同舟云学术

Increasing prediction accuracy of pathogenic staging by sample augmentation with a GAN

Published:2021-04-27 Issue:4 Volume:16 Page:e0250458
ISSN:1932-6203
Container-title:PLOS ONE
language:en
Short-container-title:PLoS ONE

Author:

Kwon ChangHyuk,Park Sangjin,Ko Soohyun,Ahn Jaegyoon^ORCID

Abstract

Accurate prediction of cancer stage is important in that it enables more appropriate treatment for patients with cancer. Many measures or methods have been proposed for more accurate prediction of cancer stage, but recently, machine learning, especially deep learning-based methods have been receiving increasing attention, mostly owing to their good prediction accuracy in many applications. Machine learning methods can be applied to high throughput DNA mutation or RNA expression data to predict cancer stage. However, because the number of genes or markers generally exceeds 10,000, a considerable number of data samples is required to guarantee high prediction accuracy. To solve this problem of a small number of clinical samples, we used a Generative Adversarial Networks (GANs) to augment the samples. Because GANs are not effective with whole genes, we first selected significant genes using DNA mutation data and random forest feature ranking. Next, RNA expression data for selected genes were expanded using GANs. We compared the classification accuracies using original dataset and expanded datasets generated by proposed and existing methods, using random forest, Deep Neural Networks (DNNs), and 1-Dimensional Convolutional Neural Networks (1DCNN). When using the 1DCNN, the F1 score of GAN5 (a 5-fold increase in data) was improved by 39% in relation to the original data. Moreover, the results using only 30% of the data were better than those using all of the data. Our attempt is the first to use GAN for augmentation using numeric data for both DNA and RNA. The augmented datasets obtained using the proposed method demonstrated significantly increased classification accuracy for most cases. By using GAN and 1DCNN in the prediction of cancer stage, we confirmed that good results can be obtained even with small amounts of samples, and it is expected that a great deal of the cost and time required to obtain clinical samples will be reduced. The proposed sample augmentation method could also be applied for other purposes, such as prognostic prediction or cancer classification.

Funder

National Research Foundation of Korea

Publisher

Public Library of Science (PLoS)

Subject

Multidisciplinary

Reference38 articles.

1. Validation of the American Joint Commission on Cancer (AJCC) staging system for patients with pancreatic adenocarcinoma: a Surveillance, Epidemiology and End Results (SEER) analysis.;SK Kamarajah;Annals of surgical oncology,2017

2. The AJCC 8th edition staging system for soft tissue sarcoma of the extremities or trunk: a cohort study of the SEER database;JM Cates;Journal of the National Comprehensive Cancer Network,2018

3. Evaluation of the prognostic stage in the 8th edition of the American Joint Committee on Cancer in locally advanced breast cancer: an analysis based on SEER 18 database.;M Wang;The Breast.,2018

4. Comparison of the 7th and 8th edition of American Joint Committee on Cancer (AJCC) staging systems for breast cancer patients: a Surveillance, Epidemiology and End Results (SEER) analysis.;N Shao;Cancer management and research.,2019

5. The prognostic significance of the 8th edition AJCC TNM staging system for non–small‐cell lung cancer is not applicable to lung cancer as a second primary malignancy;S Shi;Journal of Surgical Oncology,2020

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Building gender-specific sexually transmitted infection risk prediction models using CatBoost algorithm and NHANES data;BMC Medical Informatics and Decision Making;2024-01-24

2. Intelligent phenotype-detection and gene expression profile generation with generative adversarial networks;Journal of Theoretical Biology;2024-01

3. Mdwgan-gp: data augmentation for gene expression data based on multiple discriminator WGAN-GP;BMC Bioinformatics;2023-11-13

4. Joint triplet loss with semi-hard constraint for data augmentation and disease prediction using gene expression data;Scientific Reports;2023-10-24

5. MS-ACGAN: A modified auxiliary classifier generative adversarial network for schizophrenia's samples augmentation based on microarray gene expression data;Computers in Biology and Medicine;2023-08