Author:
Fuller Daniel T.,Mondal Sumona,Sur Shantanu,Pal Nabendu
Abstract
AbstractMicrobiomes are of vital importance for understanding human and environmental health. However, quantifying microbial composition remains challenging and relies on statistical modeling of either the raw taxonomic counts or the relative abundances. Relative abundance measures are commonly preferred over the absolute counts to analyze and interpret microbiome (as the sampling fraction are unknown in sequence data) but currently there is no ideal distribution for carrying out this modeling . In this work, the Dirichlet distribution is proposed to model the relative abundances of taxa directly without the use of any further transformation. In a comprehensive simulation study, we compared biases and standard errors of two Methods of Moments Estimators (MMEs) and Maximum Likelihood Estimator (MLE) of the Dirichlet distribution. Comparison of each estimator is done over three different cases of differing sample size and dimension: (i) small dimension and small sample size; (ii) small dimension and large sample size; (iii) large dimension with both large and small sample size. We demonstrate the Dirichlet modeling methodology with four real world microbiome datasets and show how the results of the Dirichlet model differ from those obtained by a commonly used method, namely Bayesian Dirichlet-Multinomial estimation (BDME). We find that the results of parameter estimation can be dependent upon the sequencing depth and sequencing technique used to produce a given microbiome dataset. However, for all datasets, the Dirichlet MLE (DMLE) results are comparable to the BDME results while requiring less computational time in each case.
Publisher
Cold Spring Harbor Laboratory