Abstract
Abstract
The integration of viruses into the human genome is known to be associated
with tumorigenesis in many cancers, but the accurate detection of integration
breakpoints from short read sequencing data is made difficult by human-viral
homologies, viral genome heterogeneity, coverage limitations, and other factors.
To address this, we present Exogene, a sensitive and efficient workflow for
detecting viral integrations from paired-end next generation sequencing data.
Exogene’s read filtering and breakpoint detection strategies yield integration
coordinates that are highly concordant with those found in long read validation
sets. We demonstrate this concordance across 6 TCGA Hepatocellular carcinoma
(HCC) tumor samples, identifying integrations of hepatitis B virus that are
validated by long reads. Additionally, we applied Exogene to targeted capture
data from 426 previously studied HCC samples, achieving 98.9% concordance with
existing methods and identifying 238 high-confidence integrations that were not
previously reported. Exogene is applicable to multiple types of paired-end
sequence data, including genome, exome, RNA-Seq or targeted capture.
Publisher
Cold Spring Harbor Laboratory