Machine learning approaches in microbiome research: challenges and best practices-Reference-Cited by-同舟云学术

Machine learning approaches in microbiome research: challenges and best practices

Published:2023-09-22 Issue: Volume:14 Page:
ISSN:1664-302X
Container-title:Frontiers in Microbiology
language:
Short-container-title:Front. Microbiol.

Author:

Papoutsoglou Georgios,Tarazona Sonia,Lopes Marta B.,Klammsteiner Thomas,Ibrahimi Eliana,Eckenberger Julia,Novielli Pierfrancesco,Tonda Alberto,Simeon Andrea,Shigdel Rajesh,Béreux Stéphane,Vitali Giacomo,Tangaro Sabina,Lahti Leo,Temko Andriy,Claesson Marcus J.,Berland Magali

Abstract

Microbiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To assist decision-making, we offer a set of recommendations on algorithm selection, pipeline creation and evaluation, stemming from the COST Action ML4Microbiome. We compared the suggested approaches on a multi-cohort shotgun metagenomics dataset of colorectal cancer patients, focusing on their performance in disease diagnosis and biomarker discovery. It is demonstrated that the use of compositional transformations and filtering methods as part of data preprocessing does not always improve the predictive performance of a model. In contrast, the multivariate feature selection, such as the Statistically Equivalent Signatures algorithm, was effective in reducing the classification error. When validated on a separate test dataset, this algorithm in combination with random forest modeling, provided the most accurate performance estimates. Lastly, we showed how linear modeling by logistic regression coupled with visualization techniques such as Individual Conditional Expectation (ICE) plots can yield interpretable results and offer biological insights. These findings are significant for clinicians and non-experts alike in translational applications.

Publisher

Frontiers Media SA

Subject

Microbiology (medical),Microbiology

Reference113 articles.

1. The statistical analysis of compositional data;Aitchison;J. R. Stat. Soc. B,1982

2. Taxonomic profiles, functional profiles and manually curated metadata of human fecal metagenomes from public projects coming from colorectal cancer studies (version 5) [dataset];Barbet;Recher. Data Gouv.,2023

3. The significance of microbiome in personalized medicine;Behrouzi;Clin. Transl. Med.,2019

4. Worldwide impact of lifestyle predictors of dementia prevalence: an eXplainable artificial intelligence analysis;Bellantuono;Front. Big Data,2022

Cited by 20 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. FruitSeg30_Segmentation dataset & mask annotations: A novel dataset for diverse fruit segmentation and classification;Data in Brief;2024-10

2. A comprehensive review of the dairy pasteurization process using machine learning models;Food Control;2024-10

3. Dermatological Health in the Light of Skin Microbiome Evolution;Journal of Cosmetic Dermatology;2024-09-09

4. Personalized identification of autism-related bacteria in the gut microbiome using explainable artificial intelligence;iScience;2024-09

5. MetaBakery: a Singularity implementation of bioBakery tools as a skeleton application for efficient HPC deconvolution of microbiome metagenomic sequencing data to machine learning ready information;Frontiers in Microbiology;2024-07-30