Abstract
AbstractVarying technologies and experimental approaches used in microbiome studies often lead to irreproducible results due to unwanted technical variations. Such variations, often unaccounted for and of unknown source, may interfere with true biological signals, resulting in misleading biological conclusions. In this work, we aim to characterize the major sources of technical variations in microbiome data and demonstrate how a state-of-the art approach can minimize their impact on downstream analyses. We analyzed 184 pig faecal metagenomes encompassing 21 specific combinations of deliberately introduced factors of technical and biological variations. We identify several known experimental factors, specifically storage conditions and freeze-thaw cycles, as a likely major source of unwanted variation in metagenomes. We also observed that these unwanted technical variations do not affect taxa uniformly, with freezing samples affecting taxa of class Bacteroidia the most, for example. Additionally, we benchmarked the performance of a novel batch correcting tool used in this study, RUV-III-NB (https://github.com/limfuxing/ruvIIInb/), to other popular batch correction methods, including ComBat, ComBat-seq, RUVg, and RUVs. While RUV-III-NB performed consistently robustly across our sensitivity and specificity metrics, most other methods did not remove unwanted variations optimally, with RUVg even overcorrecting and removing some of the true biological signals from the samples. Our analyses suggests that a careful consideration of possible technical confounders is critical in the experimental design of microbiome studies to ensure accurate biological reading of microbial taxa of interest, and that the inclusion of technical replicates is necessary to efficiently remove unwanted variations computationally.
Publisher
Cold Spring Harbor Laboratory