Abstract
AbstractRNA sequencing (RNA-seq) can be applied to diverse tasks including quantifying gene expression, discovering quantitative trait loci, and identifying gene fusion events. Although RNA-seq can detect germline variants, the complexities of variable transcript abundance, target capture, and amplification introduce challenging sources of error. Here, we extend DeepVariant, a deep-learning based variant caller, to learn and account for the unique challenges presented by RNA-seq data. Our DeepVariant RNA-seq model produces highly accurate variant calls from RNA-sequencing data, and outperforms existing approaches such as Platypus and GATK. We examine factors that influence accuracy, how our model addresses RNA editing events, and how additional thresholding can be used to facilitate our models’ use in a production pipeline.
Publisher
Cold Spring Harbor Laboratory