Abstract
AbstractOver the past two decades, there has been a remarkable and exponential expansion in the availability of genome sequences, encompassing a vast number of isolate genomes, amounting to hundreds of thousands, and now extending to millions of metagenome-assembled genomes. The rapid and accurate interpretation of this data, along with the profiling of diverse phenotypes such as respiration type, antimicrobial resistance, or carbon utilization, is essential for a wide range of medical and research applications.Here, we leverage sequenced-based functional annotations obtained from the RAST annotation algorithm as predictors and employ six machine learning algorithms (K-Nearest Neighbors, Gaussian Naive Bayes, Support Vector Machines, Neural Networks, Logistic Regression, and Decision Trees) to generate classifiers that can accurately predict phenotypes of unclassified bacterial organisms. We apply this approach in two case studies focused on respiration types (aerobic, anaerobic, and facultative anaerobic) and Gram-stain types (Gram negative and Gram positive). We demonstrate that all six classifiers accurately classify the phenotypes of Gram stain and respiration type, and discuss the biological significance of the predicted outcomes. We also present four new applications that have been deployed in The Department of Energy Systems Biology Knowledgebase (KBase) that enable users to: (i) Upload high-quality data to train classifiers; (ii) Annotate genomes in the training set with the RAST annotation algorithm; (iii) Build six different genome classifiers; and (iv) Predict the phenotype of unclassified genomes. (https://narrative.kbase.us/#catalog/modules/kb_genomeclassification)
Publisher
Cold Spring Harbor Laboratory
Reference30 articles.
1. Bishop, Christopher M. n.d. Pattern Recognition and Machine Learning. Springer New York. Accessed January 28, 2023.
2. RASTtk: A Modular and Extensible Implementation of the RAST Algorithm for Building Custom Annotation Pipelines and Annotating Batches of Genomes;Scientific Reports,2015
3. LIBSVM: A Library for Support Vector Machines;ACM Trans. Intell. Syst. Technol,2011
4. Cré cy-Lagard , Valérie de , Rocio Amorin de Hegedus , Cecilia Arighi , Jill Babor , Alex Bateman , Ian Blaby , Crysten Blaby-Haas , et al. 2022. “A Roadmap for the Functional Annotation of Protein Families: A Community Perspective.” Database: The Journal of Biological Databases and Curation 2022 (August). https://doi.org/10.1093/database/baac062.
5. Antimicrobial Resistance Prediction in PATRIC and RAST;Scientific Reports,2016
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献