Abstract
AbstractAluexonization, or the recruitment of intronicAluelements into gene sequences, has contributed to functional diversification; however, its extent and the ways in which it influences gene regulation are not fully understood. We developed an unbiased approach to predictAluexonization events from genomic sequences implemented in a deep learning model, eXAlu, that overcomes the limitations of tissue or condition specificity and the computational burden of RNA-seq analysis. The model captures previously reported characteristics of exonizedAlusequences and can predict sequence elements important forAluexonization. Using eXAlu, we estimate the number ofAluelements in the human genome undergoing exonization to be between 55-110K, 11-21 fold more than represented in the GENCODE gene database. Using RT-PCR we were able to validate selected predictedAluexonization events, supporting the accuracy of our method. Lastly, we highlight a potential application of our method to identify polymorphicAluinsertion exonizations in individuals and in the population from whole genome sequencing data.
Publisher
Cold Spring Harbor Laboratory