Affiliation:
1. Department of Biostatistics, University of Washington, Hans Rosling Center for Population Health, Box 351617, Seattle, WA 98195-1617, USA
Abstract
Summary
High-throughput sequencing is widely used to study microbial communities. However, choice of laboratory protocol is known to affect the resulting microbiome data, which has an unquantified impact on many comparisons between communities of scientific interest. We propose a novel approach to evaluating replicability in high-dimensional data and apply it to assess the cross-laboratory replicability of signals in microbiome data using the Microbiome Quality Control Project data set. We learn distinctions between samples as measured by a single laboratory and evaluate whether the same distinctions hold in data produced by other laboratories. While most sequencing laboratories can consistently distinguish between samples (median correct classification 87% on genus-level proportion data), these distinctions frequently fail to hold in data from other laboratories (median correct classification 55% across laboratory on genus-level proportion data). As identical samples processed by different laboratories generate substantively different quantitative results, we conclude that 16S sequencing does not reliably resolve differences in human microbiome samples. However, because we observe greater replicability under certain data transformations, our results inform the analysis of microbiome data.
Funder
The National Institute of General Medical Sciences
NIGMS
NIH
National Institute of Environmental Health Sciences
Seattle Chapter of the ARCS Foundation
ARCS Foundation
Publisher
Oxford University Press (OUP)
Subject
Statistics, Probability and Uncertainty,General Medicine,Statistics and Probability
Reference31 articles.
1. The statistical analysis of compositional data;Aitchison,;Journal of the Royal Statistical Society: Series B (Methodological),1982
2. Boosting algorithms: regularization, prediction and model fitting;Bühlmann,;Statistical Science,2007
3. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis;Callahan,;The ISME Journal,2017
4. DADA2: high-resolution sample inference from illumina amplicon data;Callahan,;Nature Methods,2016
Cited by
17 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献