RNA-seq preprocessing and sample size considerations for gene network inference-Reference-Cited by-同舟云学术

RNA-seq preprocessing and sample size considerations for gene network inference

Published:2023-01-03 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Altay Gökmen,Zapardiel-Gonzalo Jose,Peters Bjoern

Abstract

AbstractBackgroundGene network inference (GNI) methods have the potential to reveal functional relationships between different genes and their products. Most GNI algorithms have been developed for microarray gene expression datasets and their application to RNA-seq data is relatively recent. As the characteristics of RNA-seq data are different from microarray data, it is an unanswered question what preprocessing methods for RNA-seq data should be applied prior to GNI to attain optimal performance, or what the required sample size for RNA-seq data is to obtain reliable GNI estimates.ResultsWe ran 9144 analysis of 7 different RNA-seq datasets to evaluate 300 different preprocessing combinations that include data transformations, normalizations and association estimators. We found that there was no single best performing preprocessing combination but that there were several good ones. The performance varied widely over various datasets, which emphasized the importance of choosing an appropriate preprocessing configuration before GNI. Two preprocessing combinations appeared promising in general: First, Log-2 TPM (transcript per million) with Variance-stabilizing transformation (VST) and Pearson Correlation Coefficient (PCC) association estimator. Second, raw RNA-seq count data with PCC. Along with these two, we also identified 18 other good preprocessing combinations. Any of these algorithms might perform best in different datasets. Therefore, the GNI performances of these approaches should be measured on any new dataset to select the best performing one for it. In terms of the required biological sample size of RNA-seq data, we found that between 30 to 85 samples were required to generate reliable GNI estimates.ConclusionsThis study provides practical recommendations on default choices for data preprocessing prior to GNI analysis of RNA-seq data to obtain optimal performance results.

Publisher

Cold Spring Harbor Laboratory

Reference56 articles.

1. Identification of an NKX3.1-G9a-UTY transcriptional regulatory network that controls prostate differentiation

2. Langfelder P , Horvath S : WGCNA: an R package for weighted correlation network analysis. Bmc Bioinformatics 2008, 9.

3. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

4. Revealing differences in gene network inference algorithms on the network level by ensemble methods

5. Statistical inference and reverse engineering of gene regulatory networks from observational expression data;Front Genet,2012