Abstract
AbstractWith the growing appreciation for the role of regulatory differences in evolution, researchers need to reliably quantify expression levels within and among species. However, for non-model organisms genome assemblies and annotations are often not available or have inferior quality, biasing the inference of expression changes to an unknown extent. Here, we explore the possibility to map RNA-seq reads from diverged species to one high quality reference genome. As test case, we used a small primate phylogeny ranging from Human to Marmoset spanning 12% nucleotide divergence. To distinguish the effect of sequence divergence and genome quality, we used in silico evolved genomes and existing genomes to simulate RNA-seq reads. These were then mapped to the genome of origin (self-mapping) as well as to one common reference (cross-mapping) to infer the quantification biases. We find that the bias due to cross-mapping is small for the closely related great apes (≤ 4% divergence), and preferable to self-mapping given current genome qualities. For closely related species, cross-mapping provides easy access, high power and a well controlled false discovery rate for both; the analysis of intra-species expression differences as well as the detection of relative differences between species. If divergence increases, so that a substantial fraction of reads exceeds the limits of the mapper used, we find that gene-specific corrections and effect-size cutoffs can limit the bias before self-mapping becomes unavoidable. In summary, for the first time we systematically quantify biases in cross-species RNA-seq studies, providing guidance to best practices for these important evolutionary studies.
Publisher
Cold Spring Harbor Laboratory
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献