Abstract
AbstractMotivationIn high-throughput sequencing studies, sequencing depth, which quantifies the total number of reads, varies across samples. Unequal sequencing depth can obscure true biological signals of interest and prevent direct comparisons between samples. To remove variability due to differential sequencing depth, taxa counts are usually normalized before downstream analysis. However, most existing normalization methods scale counts using size factors that are sample specific but not taxa specific, which can result in over- or under-correction for some taxa.ResultsWe developed TaxaNorm, a novel normalization method based on a zero-inflated negative binomial model. This method assumes the effects of sequencing depth on mean and dispersion vary across taxa. Incorporating the zero-inflation part can better capture the nature of microbiome data. TaxaNorm showed improved performance compared to existing methods with both simulated and real data and can aid in data interpretation and visualization.Availability and implementationThe ‘TaxaNorm’ R package is freely available for download athttps://github.com/wangziyue57/TaxaNormand is available from CRAN.Contactwangziyue57@gmail.comSupplementary informationSupplementary data are available atBioinformaticsonline.
Publisher
Cold Spring Harbor Laboratory