Abstract
AbstractBackgroundAutomatic literature based discovery attempts to uncover new knowledge by connecting existing facts: information extracted from existing publications in the form of$$A \rightarrow B$$A→Band$$B \rightarrow C$$B→Crelations can be simply connected to deduce$$A \rightarrow C$$A→C. However, using this approach, the quantity of proposed connections is often too vast to be useful. It can be reduced by using subject$$\rightarrow$$→(predicate)$$\rightarrow$$→object triples as the$$A \rightarrow B$$A→Brelations, but too many proposed connections remain for manual verification.ResultsBased on the hypothesis that only a small number of subject–predicate–object triples extracted from a publication represent the paper’s novel contribution(s), we explore using BERT embeddings to identify these before literature based discovery is performed utilizing only these, important, triples. While the method exploits the availability of full texts of publications in the CORD-19 dataset—making use of the fact that a novel contribution is likely to be mentioned in both an abstract and the body of a paper—to build a training set, the resulting tool can be applied to papers with only abstracts available. Candidate hidden knowledge pairs generated from unfiltered triples and those built from important triples only are compared using a variety of timeslicing gold standards.ConclusionsThe quantity of proposed knowledge pairs is reduced by a factor of$$10^3$$103, and we show that when the gold standard is designed to avoid rewarding background knowledge, the precision obtained increases up to a factor of 10. We argue that the gold standard needs to be carefully considered, and release as yet undiscovered candidate knowledge pairs based on important triples alongside this work.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference22 articles.
1. Smalheiser NR, Swanson DR. Calcium-independent phospholipase a2 and schizophrenia. Arch Gen Psychiatry. 1997;55(8):752–3.
2. Hristovski D, Rindflesch T, Peterlin B. Using literature-based discovery to identify novel therapeutic approaches. Cardiovasc Hematol Agents Med Chem. 2013;11(1):14–24.
3. Swanson DR. Fish oil, Raynaud’s syndrome, and undiscovered public knowledge. Perspect Biol Med. 1986;30:7–18.
4. Smalheiser NR. The arrowsmith project: 2005 status report. In: Hoffmann A, Motoda H, editors. Lecture notes in computer science. Discovery science, vol. 3735. Springer; 2005.
5. Lever J, Gakkhar S, Gottlieb M, Rashnavadi T, Lin S, Siu C, Smith M, Jones MR, Krzywinski M, Jones SJM, Wren J. A collaborative filtering-based approach to biomedical knowledge discovery. Bioinformatics. 2018;34(4):652–9.
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献