Abstract
AbstractAlternative splicing (AS) is a fundamental mechanism that regulates gene expression. Splicing dynamics is involved in both physiological and pathological processes. In this paper, we introduce ASTK, a software package covering upstream and downstream analysis of AS. Initially, ASTK offers a module to perform enrichment analysis at both the gene- and exon-level to incorporate various impacts by different spliced events on a single gene. We further cluster AS genes and alternative exons into three groups based on spliced exon sizes (micro-, mid-, and macro-), which are preferentially associated with distinct biological pathways. A major challenge in the field has been decoding the regulatory codes of splicing. ASTK adeptly extracts both sequence features and epigenetic marks associated with AS events. Through the application of machine learning algorithms, we identified pivotal features influencing the inclusion levels of most AS types. Notably, the splice site strength is a primary determinant for the inclusion levels in alternative 3’/5’ splice sites (A3/A5). For the alternative first exon (AF) and skipping exon (SE) classes, a combination of sequence and epigenetic features collaboratively dictate exon inclusion/exclusion. Our findings underscore ASTK’s capability to enhance the functional understanding of AS events and shed light on the intricacies of splicing regulation.
Publisher
Cold Spring Harbor Laboratory