Abstract
AbstractThe patent literature is a potentially valuable source of bioactivity data. The SureChEMBL database (https://www.surechembl.org/) is a publicly available large-scale resource that contains compounds extracted on a daily basis from the full text, images and attachments of patent documents, through an automated text and image-mining pipeline. In this paper we describe a process to prioritise 3.7 million life science relevant patents obtained from SureChEMBL, according to how likely they were to contain bioactivity data for potent small molecules on less-studied targets, according to the classification developed by the Illuminating the Druggable Genome (IDG) project. The overall goal was to select a smaller number of patents that could be manually curated and incorporated into the ChEMBL database. We describe the approach taken, the results obtained, and provide some illustrative examples.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献