Affiliation:
1. Department of Biostatistics and Department of Statistics, University of Washington, Health Sciences Building, 1959 NE Pacific St, Seattle WA 98195, USA
Abstract
Summary
Comparing ecological communities across environmental gradients can be challenging, especially when the number of different taxonomic groups in the communities is large. In this setting, community-level summaries called diversity indices are widely used to detect changes in the community ecology. However, estimation of diversity indices has received relatively little attention from the statistical community. The most common estimates of diversity are the maximum likelihood estimates of the parameters of a multinomial model, even though the multinomial model implies strict assumptions about the sampling mechanism. In particular, the multinomial model prohibits ecological networks, where taxa positively and negatively co-occur. In this article, we leverage models from the compositional data literature that explicitly account for co-occurrence networks and use them to estimate diversity. Instead of proposing new diversity indices, we estimate popular diversity indices under these models. While the methodology is general, we illustrate the approach for the estimation of the Shannon, Simpson, Bray–Curtis, and Euclidean diversity indices. We contrast our method to multinomial, low-rank, and nonparametric methods for estimating diversity indices. Under simulation, we find that the greatest gains of the method are in strongly networked communities with many taxa. Therefore, to illustrate the method, we analyze the microbiome of seafloor basalts based on a 16S amplicon sequencing dataset with 1425 taxa and 12 communities.
Funder
National Institute of General Medical Sciences of the National Institutes of Health
Publisher
Oxford University Press (OUP)
Subject
Statistics, Probability and Uncertainty,General Medicine,Statistics and Probability
Reference47 articles.
1. The statistical analysis of compositional data;Aitchison,;Journal of Royal Statistical Society B Methodological,1982
2. Logratio analysis of compositions
3. Bayesian nonparametric dependent model for partially replicated data: the influence of fuel spills on species diversity;Arbel,;The Annals of Applied Statistics,2016
4. On a statistical estimate for the entropy of a sequence of independent random variables;Basharin,;Theory of Probability and Its Applications,1959
5. Statistical interpretation of species composition;Billheimer,;Journal of the American Statistical Association,2001
Cited by
90 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献