Abstract
AbstractBehavioral larval zebrafish screens leverage a high-throughput small molecule discovery format to find neuroactive molecules relevant to mammalian physiology. We screened a library of 650 central nervous system active compounds in high replicate to train a deep metric learning model on zebrafish behavioral profiles. The machine learning initially exploited subtle artifacts in the phenotypic screen, necessitating a complete experimental re-run with rigorous well-wise randomization. These large matched phenotypic screening datasets (initial and well-randomized) provided a unique opportunity to quantify and understand shortcut learning in a full-scale, real-world drug discovery dataset. The final deep metric learning model substantially outperforms correlation distance–the canonical way of computing distances between profiles–and generalizes to an orthogonal dataset of novel druglike compounds. We validated predictions by prospectivein vitroradio-ligand binding assays against human protein targets, achieving a hit rate of 58% despite crossing species and chemical scaffold boundaries. These newly discovered neuroactive compounds exhibited diverse chemical scaffolds, demonstrating that zebrafish phenotypic screens combined with metric learning achieve robust scaffold hopping capabilities.
Publisher
Cold Spring Harbor Laboratory