Investigation of normalization procedures for transcriptome profiles of compounds oriented toward practical study design-Reference-Cited by-同舟云学术

Investigation of normalization procedures for transcriptome profiles of compounds oriented toward practical study design

Published:2023-10-02 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Mizuno Tadahaya^ORCID,Kusuhara Hiroyuki

Abstract

AbstractThe transcriptome profile is a representative phenotype-based descriptor of compounds, widely acknowledged for its ability to effectively capture compound effects. However, the presence of batch differences is inevitable. Despite the existence of sophisticated statistical methods, many of them presume a substantial sample size. How should we design a transcriptome analysis to obtain robust compound profiles, particularly in the context of small datasets frequently encountered in practical scenarios? This study addresses this question by investigating the normalization procedures for transcriptome profiles, focusing on the baseline distribution employed in deriving biological responses as profiles. Firstly, we investigated two large transcriptome datasets, comparing the impact of different normalization procedures. Through an evaluation of the similarity between response profiles of biological replicates within each dataset and the similarity between response profiles of the same compound across datasets, we revealed that the baseline distribution defined by all samples within each batch under batch-corrected condition is a good choice for large datasets. Subsequently, we conducted a simulation to explore the influence of the number of control samples on the robustness of response profiles across datasets. The findings indicate that ideally, the number of control samples should be 6 or greater in small datasets. We believe that this study enhances our understanding of how to effectively leverage transcriptome profiles of compounds and promotes the accumulation of essential knowledge for the practical application of such profiles.

Publisher

Cold Spring Harbor Laboratory

Reference26 articles.

1. Image-based profiling for drug discovery: due for a machine-learning upgrade?

2. Duvenaud, D. et al. (2015) Convolutional Networks on Graphs for Learning Molecular Fingerprints.

3. Integrating and formatting biomedical data as pre-calculated knowledge graph embeddings in the Bioteque;Nat Commun,2022

4. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors

5. Open TG-GATEs: a large-scale toxicogenomics database