Abstract
AbstractDespite the surge in data acquisition, there is a limited availability of tools capable of effectively analyzing microbiome data that identify correlations between taxonomic compositions and continuous environmental factors. Furthermore, existing tools also do not predict the environmental factors in new samples, underscoring the pressing need for innovative solutions to enhance our understanding of microbiome dynamics and fulfill the prediction gap. Here, we introduce CODARFE, a novel tool for sparse compositional microbiome-predictors selection and prediction of continuous environmental factors. We tested CODARFE against four state-of-the-art tools in two experiments. First, CODARFE outperformed predictor selection in 21 out of 24 databases in terms of correlation. Second, among all the tools, CODARFE achieved the highest number of previously identified bacteria linked to environmental factors for human data—that is, at least 7% more. We also tested CODARFE in a cross-study, using the same biome but under different external effects (e.g., ginseng field and cattle for arable soil, and HIV and crohn’s disease for human gut), using a model trained on one dataset to predict environmental factors on another dataset, achieving 11% of mean absolute percentage error. Finally, CODARFE is available in five formats, including a Windows version with a graphical interface, to installable source code for Linux servers and an embedded Jupyter notebook available at MGnify -https://github.com/alerpaschoal/CODARFE.
Publisher
Cold Spring Harbor Laboratory
Reference59 articles.
1. The statistical analysis of compositional data;Journal of the Royal Statistical Society: Series B (Methodological,1982
2. Adaptation of soil bacterial communities to prevailing pH in different soils
3. Random forests;Machine learning,2001
4. coda4microbiome: compositional data analysis for microbiome cross-sectional and longitudinal studies;BMC bioinformatics,2023
5. V. Chandrasekhar . Disease2Vec: a method of determining disease from gut microbiome using neural embeddings. PhD thesis, Harvard University, 2020.