Affiliation:
1. Institute of Chemistry, University of Tartu, Ravila 14a, 50411 Tartu, Estonia
Abstract
Aim: The explosion of data based technology has accelerated pattern mining. However, it is clear that quality and bias of data impacts all machine learning and modeling. Results & methodology: A technique is presented for using the distribution of first significant digits of medicinal chemistry features: log P, log S, and p Ka. experimental and predicted, to assess their following of Benford's law as seen in many natural phenomena. Conclusion: Quality of data depends on the dataset sizes, diversity, and magnitudes. Profiling based on drugs may be too small or narrow; using larger sets of experimentally determined or predicted values recovers the distribution seen in other natural phenomena. This technique may be used to improve profiling, machine learning, large dataset assessment and other data based methods for better (automated) data generation and designing compounds.
Subject
Drug Discovery,Pharmacology,Molecular Medicine
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献