Abstract
BackgroundTransposable elements (TEs) constitute a significant portion of mammalian genomes, accounting for about 50% of the total DNA. Intragenic TEs are of particular interest as they are co-transcribed with their host genes in pre-mRNA, potentially leading to the formation of novel chimeric transcripts and the exonization of TEs. The abundance of RNA sequencing data currently available offers a unique opportunity to explore transcriptomic variations. However, a significant limitation is the capability of existing computational tools. Here, we introduce FREDDIE, an innovative algorithm designed to detect the exonization of retrotransposable elements using RNA-seq data. FREDDIE can process short and long RNA sequencing data, assemble and quantify transcripts, evaluate coding potential, and identify protein domains in chimeric transcripts involving exonized TEs and retrocopies.ResultsTo demonstrate the efficacy of FREDDIE, we analyzed and validated TE exonization in two human cancer cell lines, K562 and U251. We have identified 322 chimeric transcripts, of which 126 were from K562, and 196 were from U251. Among these chimeric transcripts, there were 35 that showed similar exonization patterns and host genes. These transcripts involve protein-coding genes of the host and exonization of LINE-1 (L1), Alu elements, and retrocopies of coding genes. We have selected some candidates and validated them experimentally through RT-PCR. The validation rate for these candidates was 70%, later confirmed by long-read sequencing.Additionally, we applied FREDDIE to analyze TE exonization across 157 glioblastoma samples, identifying 1,010 chimeric transcripts. The majority of these transcripts involved the exonization of Alu elements (69.8%), followed by L1 (20.6%) and retrocopies (9.6%). Notably, we discovered a highly expressed L1 exonization within the ROS gene, resulting in a truncated open reading frame (ORF) with the deletion of two protein domains.ConclusionsFREDDIE is an efficient and user-friendly tool for identifying chimeric transcripts that involve exonization of intragenic TEs. Overall, FREDDIE enables comprehensive investigations into the contributions of TEs to transcriptome evolution, variation, and disease-associated abnormalities, and it operates effectively on standard computing systems.FREDDIE is publicly available:https://github.com/galantelab/freddie
Publisher
Cold Spring Harbor Laboratory