Abstract
AbstractMetabolism is the network of chemical reactions that sustain cellular life. Parts of this metabolic network are defined as metabolic pathways containing specific biochemical reactions. Products and reactants of these reactions are called metabolites, which are associated with certain human-defined metabolic pathways. Metabolic knowledgebases, such as the Kyoto Encyclopedia of Gene and Genomes (KEGG) contain metabolites, reactions, and pathway annotations; however, such resources are incomplete due to current limits of metabolic knowledge. To fill in missing metabolite pathway annotations, past machine learning models showed some success at predicting KEGG Level 2 pathway category involvement of metabolites based on their chemical structure. Here, we present the first machine learning model to predict metabolite association to more granular KEGG Level 3 metabolic pathways. We used a feature and dataset engineering approach to generate over one million metabolite-pathway entries in the dataset used to train a single binary classifier. This approach produced a mean Matthews correlation coefficient (MCC) of 0.806 ± 0.017 SD across 100 cross-validations iterations. The 172 Level 3 pathways were predicted with an overall MCC of 0.726. Moreover, metabolite association with the 12 Level 2 pathway categories were predicted with an overall MCC of 0.891, representing significant transfer learning from the Level 3 pathway entries. These are the best metabolite-pathway prediction results published so far in the field.
Publisher
Cold Spring Harbor Laboratory
Reference36 articles.
1. Voet D , Voet JG , Pratt CW . Fundamentals of Biochemistry: Life at the Molecular. 5th ed. Wiley; 2016.
2. Berg JM , Tymoczko JL , Gatto GJ , Stryer L. Biochemistry. 9th ed. W. H. Freeman ; 2019.
3. Nelson DL , Cox MM. principles of biochemistry. 8th ed. W. H. Freeman ; 2021.
4. KEGG for taxonomy-based analysis of pathways and genomes;Nucleic Acids Res,2023
5. KEGG: Kyoto Encyclopedia of Genes and Genomes