Affiliation:
1. Software College Shenyang Normal University Shenyang 110034 China
Abstract
SummaryIdentifying natural or synthetic compounds in foods and assaying their bioactivities have significantly contributed to promoting human health. In this work, we propose a machine learning framework to predict 101 classes of health effects of food compounds at a large scale. In this framework, random undersampling boosting (RUSBoost) is used as base learners to tackle the problem of skewed class distributions and MACCSKeys similarity spectra are proposed as a feature engineering strategy to represent chemical molecules including food compounds, natural products and drugs. Computational results show that RUSBoost learners encouragingly reduce model biases, and that the knowledge learnt from food compounds is well transferable to natural products (0.8406–0.9040 recall rates for antibacterial, antivirals, pesticide and anticancer effects) and drugs (0.789–0.9690 recall rates for antibacterial, antiviral, antineoplastic and analgesic effects). and. Dozens of novel effects have been validated in recent literature. These pieces of evidence show that the proposed framework could help us to find lead compounds from food as potential pharmaceuticals and repurpose drugs for anticancer, antiviral or antibacterial therapies. Finally, we use the proposed framework to predict beneficial and risky health effects of food flavour compounds as case studies for recipe composing.