Abstract
AbstractLinking sequence-derived microbial taxa abundances to host (patho-)physiology or habitat characteristics in a reproducible and interpretable manner has remained a formidable challenge for the analysis of microbiome survey data. Here, we introduce a flexible probabilistic modeling framework, VI-MIDAS (VariationalInference forMIcrobiome surveyDAta analysiS), that enablesjointestimation of context-dependent drivers and broad patterns of associations of microbial taxon abundances from microbiome survey data. VI-MIDAS comprises mechanisms for direct coupling of taxon abundances with covariates and taxa-specific latent coupling which can incorporate spatio-temporal informationandtaxon-taxon interactions. We leverage mean-field variational inference for posterior VI-MIDAS model parameter estimation and illustrate model building and analysis using Tara Ocean Expedition survey data. Using VI-MIDAS’ latent embedding model and tools from network analysis, we show that marine microbial communities can be broadly categorized into five modules, including SAR11-, Nitrosopumilus-, and Alteromondales-dominated communities, each associated with specific environmental and spatiotemporal signatures. VI-MIDAS also finds evidence for largely positive taxon-taxon associations in SAR11 or Rhodospirillales clades, and negative associations with Alteromonadales and Flavobacteriales classes. Our results indicate that VI-MIDAS provides a powerful integrative statistical analysis framework for discovering broad patterns of associations between microbial taxa and context-specific covariate data from microbiome survey data.
Publisher
Cold Spring Harbor Laboratory
Reference80 articles.
1. The Integrative Human Microbiome Project
2. J. (John) Aitchison . The statistical analysis of compositional data. Blackburn Press, Caldwell, N.J., 2003.
3. Oxygen modulates bacterial community composition in the coastal upwelling waters off central chile;Deep Sea Research Part II: Topical Studies in Oceanography,2018
4. Simons collaborative marine atlas project (simons cmap): An open-source portal to share, visualize, and analyze ocean data;Limnology and Oceanography: Methods,2021
5. A glm-based latent variable ordination method for microbiome samples;Biometrics,2018