Abstract
AbstractThe highly-selective blood-brain barrier (BBB) prevents neurotoxic substances in blood from crossing into the extracellular fluid of the central nervous system (CNS). As such, the BBB has a close relationship with CNS disease development and treatment, so predicting whether a substance crosses the BBB is a key task in lead discovery for CNS drugs. Machine learning (ML) is a promising strategy for predicting the BBB permeability, but existing studies have been limited by small datasets with limited chemical diversity. To mitigate this issue, we present a large benchmark dataset, B3DB, complied from 50 published resources and categorized based on experimental uncertainty. A subset of the molecules in B3DB has numerical log BB values (1058 compounds), while the whole dataset has categorical (BBB+ or BBB−) BBB permeability labels (7807). The dataset is freely available at https://github.com/theochem/B3DB and 10.6084/m9.figshare.15634230.v3 (version 3). We also provide some physicochemical properties of the molecules. By analyzing these properties, we can demonstrate some physiochemical similarities and differences between BBB+ and BBB− compounds.
Publisher
Springer Science and Business Media LLC
Subject
Library and Information Sciences,Statistics, Probability and Uncertainty,Computer Science Applications,Education,Information Systems,Statistics and Probability
Cited by
57 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献