Abstract
Allelic imbalance (AI) of gene expression in heterozygous individuals is a hallmark of cis-genetic regulation, revealing mechanisms underlying the association of non-coding genetic variation with downstream traits, as in GWAS. Most methods for detecting AI from RNA-sequencing (RNA-seq) data examine allelic expression per exonic SNP, which may obscure imbalance in expression of individual isoforms. Detecting AI at the isoform level requires accounting for inferential uncertainty (IU) of expression estimates, caused by multi-mapping of RNA-seq reads to isoforms and alleles. Swish, a method developed previously for differential transcript expression accounting for IU, can be applied in a paired setting to detect AI. However, in AI analysis, most transcripts will have high IU across alleles such that even methods like Swish will lose power. Our proposed method, SEESAW, offers AI analysis at various level of resolution, including gene level, isoform level, and optionally aggregating isoforms within a gene based on their transcription start site (TSS). This TSS-based aggregation strategy strengthens the signal for transcripts that may have high IU with respect to allelic quantification. SEESAW is primarily designed for experiments with multiple replicates or conditions of organisms with the same genotype, as in an F1 cross or time course experiments of cell lines. Additionally, we introduce a new test for detecting AI that changes across a continuous covariate, as in a time course experiment. The SEESAW suite of methods is evaluated both on simulated data and applied to an RNA-seq dataset of differentiating F1 mouse osteoblasts.
Publisher
Cold Spring Harbor Laboratory