Abstract
AbstractFungal secondary metabolites (SMs) play a significant role in the diversity of ecological communities, niches, and lifestyles in the fungal kingdom. Many fungal SMs have medically and industrially important properties including antifungal, antibacterial, and antitumor activity, and a single metabolite can display multiple types of bioactivities. The genes necessary for fungal SM biosynthesis are typically found in a single genomic region forming biosynthetic gene clusters (BGCs). However, whether fungal SM bioactivity can be predicted from specific attributes of genes in BGCs remains an open question. We adapted previously used machine learning models for predicting SM bioactivity from bacterial BGC data to fungal BGC data. We trained our models to predict antibacterial, antifungal, and cytotoxic/antitumor bioactivity on two datasets: 1) fungal BGCs (dataset comprised of 314 BGCs), and 2) fungal (314 BGCs) and bacterial BGCs (1,003 BGCs); the second dataset was our control since a previous study using just the bacterial BGC data yielded prediction accuracies as high as 80%. We found that the models trained only on fungal BGCs had balanced accuracies between 51-68%, whereas training on bacterial and fungal BGCs yielded balanced accuracies between 61-74%. The lower accuracy of the predictions from fungal data likely stems from the small number of BGCs and SMs with known bioactivity; this lack of data currently limits the application of machine learning approaches in studying fungal secondary metabolism. However, our data also suggest that machine learning approaches trained on bacterial and fungal data can predict SM bioactivity with good accuracy. With more than 15,000 characterized fungal SMs, millions of putative BGCs present in fungal genomes, and increased demand for novel drugs, efforts that systematically link fungal SM bioactivity to BGCs are urgently needed.
Publisher
Cold Spring Harbor Laboratory