Abstract
AbstractAmplicon sequencing has revolutionized our ability to study DNA collected from environmental samples by providing a rapid and sensitive technique for microbial community analysis that eliminates the challenges associated with lab cultivation and taxonomic identification through microscopy. In water resources management, it can be especially useful to evaluate ecosystem shifts in response to natural and anthropogenic landscape disturbances to signal potential water quality concerns, such as the detection of toxic cyanobacteria or pathogenic bacteria. Amplicon sequencing data consist of discrete counts of sequence reads, the sum of which is the library size. Groups of samples typically have different library sizes that are not representative of biological variation; library size normalization is required to meaningfully compare diversity between them. Rarefaction is a widely used normalization technique that involves the random subsampling of sequences from the initial sample library to a selected normalized library size. Rarefying is often dismissed as statistically invalid because subsampling effectively discards a portion of the observed sequences. Nonetheless, it remains prevalent in practice. Notably, the superiority of rarefying relative to many other normalization approaches has been argued in diversity analysis. Here, repeated rarefying is proposed as a tool for diversity analyses to normalize library sizes. This enables (i) proportionate representation of all observed sequences and (ii) characterization of the random variation introduced to diversity analyses by rarefying to a smaller library size shared by all samples. While many deterministic data transformations are not tailored to produce equal library sizes, repeatedly rarefying reflects the probabilistic process by which amplicon sequencing data are obtained as a representation of the source microbial community. Specifically, it evaluates which data might have been obtained if a particular sample’s library size had been smaller and allows graphical representation of the effects of this library size normalization process upon diversity analysis results.
Publisher
Cold Spring Harbor Laboratory
Reference70 articles.
1. Deblur Rapidly Resolves Single-Nucleotide Community Sequence Patterns
2. Badri, M. , Kurtz, Z. , Muller, C. , Bonneau, R. , 2018. Normalization methods for microbial abundance data strongly affect correlation estimates. bioRxiv 406264. https://doi.org/10.1101/406264
3. Generation of Multimillion-Sequence 16S rRNA Gene Libraries from Complex Microbial Communities by Assembling Paired-End Illumina Reads
4. Bisanz, J.E. , 2018. qiime2R: Importing QIIME2 artifacts and associated data into R sessions. https://github.com/jbisanz/qiime2R.
5. Challenges of unculturable bacteria: environmental perspectives;Rev. Environ. Sci. Biotechnol,2020
Cited by
24 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献