SnorkelPlus: A Novel Approach for Identifying Relationships Among Biomedical Entities Within Abstracts-Reference-Cited by-同舟云学术

SnorkelPlus: A Novel Approach for Identifying Relationships Among Biomedical Entities Within Abstracts

Published:2023-05-04 Issue: Volume: Page:
ISSN:0010-4620
Container-title:The Computer Journal
language:en
Short-container-title:

Author:

Kumar Ashutosh¹,Sharaff Aakanksha¹

Affiliation:

1. Department of Computer Science and Engineering, National Institute of Technology Raipur , Raipur, Chhattisgarh 492010 , India

Abstract

Abstract Identifying relationships between biomedical entities from unstructured biomedical text is a challenging task. SnorkelPlus has been proposed to provide the flexibility to extract these biomedical relations without any human effort. Our proposed model, SnorkelPlus, is aimed at finding connections between gene and disease entities. We achieved three objectives: (i) extract only gene and disease articles from NCBI’s, PubMed or PubMed central database, (ii) define reusable label functions and (iii) ensure label function accuracy using generative and discriminative models. We utilized deep learning methods to achieve label training data and achieved an AUROC of 85.60% for the generated gene and disease corpus from PubMed articles. Snorkel achieved an AUPR of 45.73%, which is +2.3% higher than the baseline model. We created a gene–disease relation database using SnorkelPlus from approximately 29 million scientific abstracts without involving annotated training datasets. Furthermore, we demonstrated the generalizability of our proposed application on abstracts of PubMed articles enriched with different gene and disease relations. In the future, we plan to design a graphical database using Neo4j.

Funder

Department of Computer Science and Engineering

National Institute of Technology

Publisher

Oxford University Press (OUP)

Subject

General Computer Science

Link

https://academic.oup.com/comjnl/advance-article-pdf/doi/10.1093/comjnl/bxad051/50202436/bxad051.pdf

Reference30 articles.

1. Biological network exploration with cytoscape 3;Su;Curr. Protoc. Bioinformatics,2014

2. Snorkel: Fast training set generation for information extraction;Ratner,2017

3. Ppicurator: a tool for extracting comprehensive protein–protein interaction information;Li;Proteomics,2019

4. The research on gene-disease association based on text-mining of pubmed;Zhou;BMC Bioinformatics,2018

5. Pubmed 2.0;White;Med. Ref. Serv. Q.,2020

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Knowledge-injected Prompt Learning for Chinese Biomedical Entity Normalization;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-08-23