Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin-Reference-Cited by-同舟云学术

Augmentation of Transcriptomic Data for Improved Classification of Patients with Respiratory Diseases of Viral Origin

Published:2022-02-24 Issue:5 Volume:23 Page:2481
ISSN:1422-0067
Container-title:International Journal of Molecular Sciences
language:en
Short-container-title:IJMS

Author:

Kircher Magdalena,Chludzinski Elisa,Krepel Jessica,Saremi Babak,Beineke Andreas,Jung Klaus^ORCID

Abstract

To better understand the molecular basis of respiratory diseases of viral origin, high-throughput gene-expression data are frequently taken by means of DNA microarray or RNA-seq technology. Such data can also be useful to classify infected individuals by molecular signatures in the form of machine-learning models with genes as predictor variables. Early diagnosis of patients by molecular signatures could also contribute to better treatments. An approach that has rarely been considered for machine-learning models in the context of transcriptomics is data augmentation. For other data types it has been shown that augmentation can improve classification accuracy and prevent overfitting. Here, we compare three strategies for data augmentation of DNA microarray and RNA-seq data from two selected studies on respiratory diseases of viral origin. The first study involves samples of patients with either viral or bacterial origin of the respiratory disease, the second study involves patients with either SARS-CoV-2 or another respiratory virus as disease origin. Specifically, we reanalyze these public datasets to study whether patient classification by transcriptomic signatures can be improved when adding artificial data for training of the machine-learning models. Our comparison reveals that augmentation of transcriptomic data can improve the classification accuracy and that fewer genes are necessary as explanatory variables in the final models. We also report genes from our signatures that overlap with signatures presented in the original publications of our example data. Due to strict selection criteria, the molecular role of these genes in the context of respiratory infectious diseases is underlined.

Funder

Deutsche Forschungsgemeinschaft

Publisher

MDPI AG

Subject

Inorganic Chemistry,Organic Chemistry,Physical and Theoretical Chemistry,Computer Science Applications,Spectroscopy,Molecular Biology,General Medicine,Catalysis

Link

https://www.mdpi.com/1422-0067/23/5/2481/pdf

Reference58 articles.

1. Transcriptomic Biomarkers to Discriminate Bacterial from Nonbacterial Infection in Adults Hospitalized with Respiratory Illness

2. Characterization of cellular transcriptomic signatures induced by different respiratory viruses in human reconstituted airway epithelia

3. Epigenomics and Transcriptomics in the Prediction and Diagnosis of Childhood Asthma: Are We There Yet?

4. Whole Blood Gene Expression Profiles to Assess Pathogenesis and Disease Severity in Infants with Respiratory Syncytial Virus Infection

5. Support vector machine classification and validation of cancer tissue samples using microarray expression data

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Machine Learning Model for the Prediction of COVID-19 Severity Using RNA-Seq, Clinical, and Co-Morbidity Data;Diagnostics;2024-06-18

2. Mask-cscGAN for realistic synthetic cell generation;2023 IEEE International Conference on Big Data (BigData);2023-12-15

3. Data‐driven Bayesian network learning analysis on the regulatory mechanism between carcinogenic genes and immune cells;Clinical and Translational Discovery;2023-12

4. Signature Informed Sampling for Transcriptomic Data;2023-10-31

5. MS-ACGAN: A modified auxiliary classifier generative adversarial network for schizophrenia's samples augmentation based on microarray gene expression data;Computers in Biology and Medicine;2023-08