Abstract
ABSTRACTChemical space exploration has gained significant interest with the increase in available building blocks, which enables the creation of ultra-large virtual libraries containing billions or even trillions of compounds. However, the challenge of selecting most suitable compounds for synthesis arises, and one such challenge is hit expansion. Recently, Thompson sampling, a probabilistic search approach, has been proposed by Walterset al. to achieve efficiency gains by operating in the reagent space rather than the product space. Here, we aim to address some of its shortcomings and propose optimizations. We introduce a warmup routine to ensure that initial probabilities are set for all reagents with a minimum number of molecules evaluated. Additionally, a roulette wheel selection is proposed with adapted stop criteria to improve sampling efficiency, and belief distributions of reagents are only updated when they appear in new molecules. We demonstrate that a 100% recovery rate can be achieved by sampling 0.1% of the fully enumerated library, showcasing the effectiveness of our proposed optimizations.
Publisher
Cold Spring Harbor Laboratory