Affiliation:
1. Laboratory of Biological System Modeling, Center of Technological Development in Health (CDTS), Oswaldo Cruz Foundation (Fiocruz), Rio de Janeiro 21040-900, RJ, Brazil
Abstract
RNA-seq faces persistent challenges due to the ongoing, expanding array of data processing workflows, none of which have yet achieved standardization to date. It is imperative to determine which method most effectively preserves biological facts. Here, we used Shannon entropy as a tool for depicting the biological status of a system. Thus, we assessed the measurement of Shannon entropy by several RNA-seq workflow approaches, such as DESeq2 and edgeR, but also by combining nine normalization methods with log2 fold change on paired samples of TCGA RNA-seq representing datasets of 515 patients and spanning 12 different cancer types with 5-year overall survival rates ranging from 20% to 98%. Our analysis revealed that TPM, RLE, and TMM normalization, coupled with a threshold of log2 fold change ≥1, for identifying differentially expressed genes, yielded the best results. We propose that Shannon entropy can serve as an objective metric for refining the optimization of RNA-seq workflows and mRNA sequencing technologies.
Funder
Fundação Carlos Chagas Filho de Amparo à Pesquisa do Estado do Rio de Janeiro - FAPERJ