Abstract
AbstractThe analysis and prediction of complex traits using microbiome data combined with host genomic information is a topic of utmost interest. However, numerous questions remain to be answered: How useful can the microbiome be for complex trait prediction? Are microbiability estimates reliable? Can the underlying biological links between the host’s genome, microbiome, and the phenome be recovered? Here, we address these issues by (i) developing a novel simulation strategy that uses real microbiome and genotype data as input, and (ii) proposing a variance-component approach which, in the spirit of mediation analyses, quantifies the proportion of phenotypic variance explained by genome and microbiome, and dissects it into direct and indirect effects. The proposed simulation approach can mimic a genetic link between the microbiome and SNP data via a permutation procedure that retains the distributional properties of the data. Results suggest that microbiome data could significantly improve phenotype prediction accuracy, irrespective of whether some abundances are under direct genetic control by the host or not. Overall, random-effects linear methods appear robust for variance components estimation, despite the highly leptokurtic distribution of microbiota abundances. Nevertheless, we observed that accuracy depends in part on the number of microorganisms’ taxa influencing the trait of interest. While we conclude that overall genome-microbiome-links can be characterized via variance components, we are less optimistic about the possibility of identifying the causative effects, i.e., individual SNPs affecting abundances; power at this level would require much larger sample sizes than the ones typically available for genome-microbiome-phenome data.Author summaryThe microbiome consists of the microorganisms that live in a particular environment, including those in our organism. There is consistent evidence that these communities play an important role in numerous traits of relevance, including disease susceptibility or feed efficiency. Moreover, it has been shown that the microbiome can be relatively stable throughout an individual’s life and that is affected by the host genome. These reasons have prompted numerous studies to determine whether and how the microbiome can be used for prediction of complex phenotypes, either using microbiome alone or in combination with host’s genome data. However, numerous questions remain to be answered such as the reliability of parameter estimates, or which is the underlying relationship between microbiome, genome, and phenotype. The few available empirical studies do not provide a clear answer to these problems. Here we address these issues by developing a novel simulation strategy and we show that, although the microbiome can significantly help in prediction, it will be difficult to retrieve the actual biological basis of interactions between the microbiome and the trait.
Publisher
Cold Spring Harbor Laboratory