Abstract
AbstractThe accuracy and robustness of many types of analyses performed using RNA-seq data are directly impacted by the quality of the transcript and gene abundance estimates inferred from this data. However, a certain degree of uncertainty is always associated with the transcript abundance estimates. This uncertainty may make many downstream analyses, such as differential testing, difficult for certain transcripts. Conversely, gene-level analysis, though less ambiguous, is often too coarse-grained. To circumvent this problem, methods have proposed grouping transcripts together into distinct inferential units that should be used as a base unit for analysis. However, these methods don’t take downstream analysis into account.We introduceTreeTerminus, a data-driven approach for grouping transcripts into a tree structure where leaves represent individual transcripts and internal nodes represent an aggregation of a transcript set.TreeTerminusconstructs trees such that, on average, the inferential uncertainty decreases as we ascend the tree topology. The tree provides the flexibility to analyze data at nodes that are at different levels of resolution in the tree and can be tuned depending on the analysis of interest. To obtain fixed groups for the downstream analysis, we provide a dynamic programming (DP) approach that can be used to find a cut through the tree that optimizes one of several different objectives.We evaluatedTreeTerminuson two simulated and two experimental datasets, and observed an improved performance compared to transcripts (leaves) and other methods under several different metrics.
Publisher
Cold Spring Harbor Laboratory
Reference48 articles.
1. Consensus Techniques and the Comparison of Taxonomic Trees
2. Simon Andrews , Felix Krueger , Anne Segonds-Pichon , Laura Biggins , Christel Krueger , and Steven Wingett . FastQC. Babraham Institute, January 2012.
3. Alternative splicing as a regulator of development and tissue identity
4. Hierarchical correction of p-values via an ultrametric tree running ornstein-uhlenbeck process;Computational Statistics,2022
5. Tree-aggregated predictive modeling of microbiome data;Scientific Reports,2021