Abstract
AbstractIn many high-throughput sequencing (HTS) studies, sample-to-sample variation in sequencing depth is not driven by variation in the scale (e.g., total size, microbial load, or total gene expression) of the underlying biological systems being measured but rather is driven by technical factors. Typically the technical variation is addressed using some form of statistical normalization. Normalizations are data or parameter transformations that remove unwanted technical variation in the hopes of facilitating analyses sensitive to scale; e.g., differential abundance and differential expression analyses. Recently we showed that any normalization makes implicit assumptions about the unmeasured system scale and that errors in these assumptions can lead to dramatic increases in false positive and false negative rates. Here we describe updates to the ALDEx2 R package that mitigate these problems by directly modeling uncertainty in the unmeasured system scale through the use of ascale model. Scale models generalize the idea of normalizations and can be thought of as explicitly modeling potential error in the chosen normalization. Beyond enhancing the robustness of HTS analyses, the use of scale models within ALDEx2 enhances the transparency and reproducibility of analyses by making implicit normalizing assumptions an explicit part of the model building process.
Publisher
Cold Spring Harbor Laboratory