Abstract
AbstractEndogenous transposable elements (TEs) are implicated in human diseases due to their propensity to compromise genome integrity. Although short-read sequencing is now frequently used to examine TE expression, the highly repetitive nature of TEs limits their accurate quantification at the locus-specific level. We have developed LocusMasterTE, an improved method that integrates information from long-read RNA sequencing to enhance TE quantification. The fractional transcript per million (TPM) from long reads serves as a prior distribution during the Expectation-Maximization (EM) model in short-read TE quantification, thereby enabling the reassignment of multi-mapped reads to correct expression values. Using simulated short-reads, our results indicate that LocusMasterTE outperforms existing quantitative approaches and is especially favorable for quantifying evolutionarily younger TEs. Using matched cell line RNA-seq data, we further demonstrate improved locus-specific TE quantification by LocusMasterTE with stronger enrichment in active, and depletion at repressive, histone marks. Finally, by integrating colorectal cancer cell line long-read sequencing data with short read RNA-seq data from The Cancer Genome Atlas colorectal cancer cohort, we demonstrate LocusMasterTE’s ability to identify survival-related TEs and uncover new expression associations between locus-specific TEs and neighboring genes. By providing more accurate quantification, LocusMasterTE offers the potential to reveal novel functions of TE transcripts.
Publisher
Cold Spring Harbor Laboratory