Abstract
ABSTRACTThe development of sequencing technologies to evaluate bacterial microbiota composition has allowed new insights into the importance of microbial ecology. However, the variety of methodologies used among amplicon sequencing workflows leads to uncertainty about best practices as well as reproducibility and replicability among microbiome studies. Using a bacterial mock community composed of 37 soil isolates, we performed a comprehensive methodological evaluation of 540 workflows, each with a different combination of methodological factors spanning sample preparation to bioinformatic analysis to define sources of artifacts that affect sensitivity, specificity, and biases in the resulting compositional profiles. Of the 540 workflows examined, those using the V4-V4 primer set enabled the highest level of concordance between the original mock community and resulting microbiome sequence composition. Use of a high-fidelity polymerase, or a lower-fidelity polymerase with increased PCR elongation time limited chimera formation. Bioinformatic pipelines presented a trade-off between the fraction of distinct community members identified (sensitivity) and fraction of correct sequences (specificity). DADA2 and QIIME2 assembled V4-V4 reads amplified by Taq polymerase resulted in the highest specificity (100%), but only identified 52% of mock community members. Using mothur to assemble and denoise V4-V4 reads resulted in detection of 75% of mock community members among the resulting sequences, albeit with marginally lower specificity (99.5%). Optimization of microbiome workflows is critical for accuracy and to support reproducibility and replicability among microbiome studies. These aspects will help reveal the guiding principles of microbial ecology and impact the translation of microbiome research to human and environmental health.
Publisher
Cold Spring Harbor Laboratory