Abstract
AbstractTraditional bulk RNA-Seq pipelines do not assess cell-type composition within heterogeneous tissues. Therefore, it is difficult to determine whether conflicting findings among samples or datasets are the result of biological differences or technical differences due to variation in sample collections. This report provides a user-friendly, open source method to assess cell-type composition in bulk RNA-Seq datasets for heterogeneous tissues using published single cell (sc)RNA-Seq data as a reference. As an example, we apply the method to analysis of kidney cortex bulk RNA-Seq data from female (N=8) and male (N=9) baboons to assess whether observed transcriptome sex differences are biological or technical, i.e., variation due to ultrasound guided biopsy collections. We found cell-type composition was not statistically different in female versus male transcriptomes based on expression of 274 kidney cell-type specific transcripts, indicating differences in gene expression are not due to sampling differences. This method of cell-type composition analysis is recommended for providing rigor in analysis of bulk RNA-Seq datasets from complex tissues. It is clear that with reduced costs, more analyses will be done using scRNA-Seq; however, the approach described here is relevant for data mining and meta analyses of the thousands of bulk RNA-Seq data archived in the NCBI GEO public database.Author SummaryThis method, which provides a simple method for assessing sampling biases in bulk RNA-Seq datasets with evaluation of cell-type composition, will aid researchers in assessing whether bulk RNA-Seq from different studies of the same heterogeneous tissue are comparable. The additional layer of information can help determine if differential gene expression observed is biological or technical, i.e., cell composition variation among study samples. The described method uses publicly available bioinformatics resources and does not require coding expertise or high-capacity computational processing. Development of tools accessible to scientists without computing expertise will contribute to greater rigor and reproducibility for bioinformatic analyses of transcriptome data.
Publisher
Cold Spring Harbor Laboratory