Surveying biomedical relation extraction: a critical examination of current datasets and the proposal of a new resource

Author:

Huang Ming-Siang12345,Han Jen-Chieh67,Lin Pei-Yen12,You Yu-Ting12,Tsai Richard Tzong-Han678ORCID,Hsu Wen-Lian1245

Affiliation:

1. Intelligent Agent Systems Laboratory , Department of Computer Science and Information Engineering, , New Taipei City , Taiwan

2. Asia University , Department of Computer Science and Information Engineering, , New Taipei City , Taiwan

3. National Institute of Cancer Research, National Health Research Institutes , Tainan , Taiwan

4. Department of Computer Science and Information Engineering , College of Information and Electrical Engineering, , Taichung , Taiwan

5. Asia University , College of Information and Electrical Engineering, , Taichung , Taiwan

6. Intelligent Information Service Research Laboratory , Department of Computer Science and Information Engineering, , Taoyuan , Taiwan

7. National Central University , Department of Computer Science and Information Engineering, , Taoyuan , Taiwan

8. Center for Geographic Information Science, Research Center for Humanities and Social Sciences, Academia Sinica , Taipei , Taiwan

Abstract

Abstract Natural language processing (NLP) has become an essential technique in various fields, offering a wide range of possibilities for analyzing data and developing diverse NLP tasks. In the biomedical domain, understanding the complex relationships between compounds and proteins is critical, especially in the context of signal transduction and biochemical pathways. Among these relationships, protein–protein interactions (PPIs) are of particular interest, given their potential to trigger a variety of biological reactions. To improve the ability to predict PPI events, we propose the protein event detection dataset (PEDD), which comprises 6823 abstracts, 39 488 sentences and 182 937 gene pairs. Our PEDD dataset has been utilized in the AI CUP Biomedical Paper Analysis competition, where systems are challenged to predict 12 different relation types. In this paper, we review the state-of-the-art relation extraction research and provide an overview of the PEDD’s compilation process. Furthermore, we present the results of the PPI extraction competition and evaluate several language models’ performances on the PEDD. This paper’s outcomes will provide a valuable roadmap for future studies on protein event detection in NLP. By addressing this critical challenge, we hope to enable breakthroughs in drug discovery and enhance our understanding of the molecular mechanisms underlying various diseases.

Funder

Ministry of Education

Ministry of Science and Technology

Bioinformatics Core Facility for Biotechnology and Pharmaceuticals

Publisher

Oxford University Press (OUP)

Reference119 articles.

1. Artificial intelligence in healthcare: past, present and future;Jiang;Stroke Vasc Neurol,2017

2. Mining electronic health records: towards better research applications and clinical care;Jensen;Nat Rev Genet,2012

3. Electronic health records: then, now, and in the future;Evans;Yearb Med Inform,2016

4. Scalable and accurate deep learning with electronic health records;Rajkomar;NPJ Digit Med,2018

5. Accomplishments and challenges in literature data mining for biology;Hirschman;Bioinformatics,2002

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3