Abstract
AbstractMotivationFungi are key elements in several important ecological functions, ranging from organic matter decomposition to symbiotic associations with plants. Moreover, fungi naturally inhabit the human microbiome and can be causative agents of human infections. An accurate and robust method for fungal ITS classification is not only desired for the purpose of better diversity estimation, but it can also help us gain a deeper insight of the dynamics of environmental communities and ultimately comprehend whether the abundance of certain species correlate with health and disease. Although many methods have been proposed for taxonomic classification, to the best of our knowledge, none of them consider the taxonomic tree hierarchy when building their models. This in turn, leads to lower generalization power and higher risk of committing classification errors.ResultsIn this work, we developed a robust, hierarchical machine learning model for accurate ITS classification, which requires a small amount of data for training and is able to handle imbalanced datasets. We show that our hierarchical model, HiTaC, outperforms state-of-the-art methods when trained over noisy data, consistently achieving higher accuracy and sensitivity across different taxonomic ranks.AvailabilityHiTaC is an open-source software, with documentation and source code available at https://gitlab.com/dacs-hpi/hitac.Contactvitor.cedranpiro@hpi.deSupplementary informationSupplementary data are available at bioRxiv online.
Publisher
Cold Spring Harbor Laboratory