Abstract
AbstractBackgroundTo identify operational taxonomy units (OTUs) signaling disease onset in an observational study, a powerful strategy was selecting participants by matched sets and profiling temporal metagenomes, followed by trajectory analysis. Existing trajectory analyses modeled individual OTU or microbial community without adjusting for the within-community correlation and matched-set-specific latent factors.ResultsWe proposed a joint model with matching and regularization (JMR) to detect OTU-specific compositional trajectory predictive of host disease status, using nested random effects and covariate taxa pre-selected by Bray-Curtis distance and elastic net regression. The inherent negative correlation in microbiota composition was adjusted by incorporating the top-correlated taxa as covariate. We designed a simulation pipeline to generate true biomarkers for disease onset and the pseudo biomarkers caused by compositionality or latent noises. We demonstrated that JMR effectively controlled the false discovery and pseudo biomarkers in a simulation study that generated temporal high-dimensional metagenomic counts with random intercept or slope. Application of the competing methods in the simulated data and the TEDDY cohort showed that JMR outperformed the other methods and identified important taxa in infants’ fecal samples with dynamics preceding host disease status.ConclusionOur method JMR is a robust framework that models taxon-specific compositional trajectory and host disease status in the matched participants, improving the power of detecting disease-predictive microbial features in certain scenarios.
Publisher
Cold Spring Harbor Laboratory