Abstract
AbstractThe composition of the gut microbiota is a known factor in various diseases, and has proven to be a strong basis for automatic classification of disease state. A need for a better understanding of this community on the functional scale has since been voiced, as it would enhance these approaches’ biological interpretability. In this paper, we have developed a computational pipeline for integrating the functional annotation of the gut microbiota to an automatic classification process, and facilitating downstream interpretation of its results. The process takes as input taxonomic composition data (such as tables of Operational Taxonomic Unit (OTU) or Amplicon Sequence Variant (ASV) abundances), and links each component to its functional annotations through interrogation of the UniProt database. A functional profile of the gut microbiota is built from this basis. Both profiles, microbial and functional, are used to train Random Forest classifiers to discern unhealthy from control samples. An automatic variable selection is then performed on the basis of variable importance, and the method can be iterated until classification performances diminish. This process shows that the translation of the microbiota into functional profiles gives comparable, albeit slightly inferior performances when compared to microbial profiles. Through repetition, it also outputs a robust subset of discriminant variables. These selections were shown to be more reliable than those obtained by a state of the art method, and its contents were validated through a manual bibliographic research. The interconnections between selected OTUs and functional annotations were also analyzed, and revealed that important annotations emerge from the cumulated influence of non-selected OTUs.
Publisher
Cold Spring Harbor Laboratory