Abstract
AbstractMotivationMicrobiota data suffers from technical noise (reflected as excess of zeros in the count matrix) and the curse of dimensionality. This complicates downstream data analysis and compromises the scientific discovery’s reliability. Data sparsity makes it difficult to obtain a well-cluster structure and distorts the abundance distributions. Currently, there is a rised need to develop new algorithms with improved capacities to reduce noise and recover missing information.ResultsWe present mb-PHENIX, an open-source algorithm developed in Python, that recovers taxa abundances from the noisy and sparse microbiota data. Our method deals with sparsity in the count matrix (in 16S microbiota and shotgun studies) by applying imputation via diffusion onto the supervisedUniform Manifold Approximation Projection(sUMAP) space. Our hybrid machine learning approach allows the user to denoise microbiota data. Thus, the differential abundance of microbes is more accurate among study groups, where abundance analysis fails.AvailabilityThe mb-PHENIX algorithm is available athttps://github.com/resendislab/mb-PHENIX. An easy-to-use implementation is available on Google Colab (see GitHub)ContactOresendis@inmegen.gob.mxSupplementary informationSupplementary data are available atBioinformaticsonline.
Publisher
Cold Spring Harbor Laboratory
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献