Abstract
AbstractMotivationLiterature-Based Discovery (LBD) aims to help researchers to identify relations between concepts which are worthy of further investigation by text-mining the biomedical literature. While the LBD literature is rich and the field is considered mature, standard practice in the evaluation of LBD methods is methodologically poor and has not progressed on par with the domain. The lack of properly designed and decent-sized benchmark dataset hinders the progress of the field and its development into applications usable by biomedical experts.ResultsThis work presents a method for mining past discoveries from the biomedical literature. It leverages the impact made by a discovery, using descriptive statistics to detect surges in the prevalence of a relation across time. This method allows the collection of a large amount of time-stamped discoveries which can be used for LBD evaluation or other applications. The validity of the method is tested against a baseline representing the state of the art “time sliced” method.AvailabilityThe source data used in this article are publicly available. The implementation and the resulting data are published under open-source license:https://github.com/erwanm/medline-discoveries(code)https://zenodo.org/record/5888572(datasets). An online exploration tool is also provided athttps://brainmend.adaptcentre.ie/.Contacterwan.moreau@adaptcentre.ie
Publisher
Cold Spring Harbor Laboratory