Abstract
AbstractAccurate classification of host phenotypes from microbiome data is essential for future therapies in microbiome-based medicine and machine learning approaches have proved to be an effective solution for the task. The complex nature of the gut microbiome, data sparsity, compositionality and population-specificity however remain challenging, which highlights the critical need for standardized methodologies to improve the accuracy and reproducibility of the results. Microbiome data transformations can alleviate some of the aforementioned challenges, but their usage in machine learning tasks has largely been unexplored. Our aim was to assess the impact of various data transformations on the accuracy, generalizability and feature selection by analysis using more than 8,500 samples from 24 shotgun metagenomic datasets. Our findings demonstrate the feasibility of distinguishing between healthy and diseased individuals using microbiome data with minimal dependence on the algorithm and transformation selection. Remarkably, presence-absence transformation performed comparably well to abundance-based transformations, and only a small subset of predictors is crucial for accurate classification. However, while different transformations resulted in comparable classification performance, the most important features varied significantly, which highlight the need to reevaluate machine-learning based biomarker detection. Our research provides valuable guidance for applying machine learning on microbiome data, offering novel insights and highlighting important areas for future research.
Publisher
Cold Spring Harbor Laboratory
Reference39 articles.
1. Gut Metagenome Associations with Extensive Digital Health Data in a Volunteer-Based Estonian Microbiome Cohort;Nature Communications,2022
2. The Statistical Analysis of Compositional Data;Journal of the Royal Statistical Society. Series B, Statistical Methodology,1982
3. Recurrent neural networks enable design of multifunctional synthetic human gut microbiome dynamics
4. Integrating taxonomic, functional, and strain-level profiling of diverse microbial communities with bioBakery 3
5. Gut Microbiota Dysbiosis Associated With Altered Production of Short Chain Fatty Acids in Children With Neurodevelopmental Disorders;Frontiers in Cellular and Infection Microbiology,2020