Interpretable metric learning in comparative metagenomics: The adaptive Haar-like distance-Reference-Cited by-同舟云学术

Interpretable metric learning in comparative metagenomics: The adaptive Haar-like distance

Published:2024-05-20 Issue:5 Volume:20 Page:e1011543
ISSN:1553-7358
Container-title:PLOS Computational Biology
language:en
Short-container-title:PLoS Comput Biol

Author:

Gorman Evan D.,Lladser Manuel E.^ORCID

Abstract

Random forests have emerged as a promising tool in comparative metagenomics because they can predict environmental characteristics based on microbial composition in datasets where β-diversity metrics fall short of revealing meaningful relationships between samples. Nevertheless, despite this efficacy, they lack biological insight in tandem with their predictions, potentially hindering scientific advancement. To overcome this limitation, we leverage a geometric characterization of random forests to introduce a data-driven phylogenetic β-diversity metric, the adaptive Haar-like distance. This new metric assigns a weight to each internal node (i.e., split or bifurcation) of a reference phylogeny, indicating the relative importance of that node in discerning environmental samples based on their microbial composition. Alongside this, a weighted nearest-neighbors classifier, constructed using the adaptive metric, can be used as a proxy for the random forest while maintaining accuracy on par with that of the original forest and another state-of-the-art classifier, CoDaCoRe. As shown in datasets from diverse microbial environments, however, the new metric and classifier significantly enhance the biological interpretability and visualization of high-dimensional metagenomic samples.

Funder

National Science Foundation

Publisher

Public Library of Science (PLoS)

Reference87 articles.

1. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis;BJ Callahan;The ISME Journal,2017

2. An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea;D McDonald;The ISME Journal,2012

3. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools;C Quast;Nucleic acids research,2013

4. Phylogenomics of 10,575 genomes reveals evolutionary proximity between domains Bacteria and Archaea;E Principi;Nature Communications,2021

5. Phylogenetic beta diversity: Linking ecological and evolutionary processes across space in time;CH Graham;Ecology Letters,2008

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Controlled Porosity of Selective Laser Melting-Produced Thermal Pipes: Experimental Analysis and Machine Learning Approach for Pore Recognition on Pipes Surfaces;Sensors;2024-07-31