Affiliation:
1. Health Innovation and Transformation Centre, Federation University, Ballarat, VIC 3842, Australia
2. BioThink Pty Ltd., Brisbane, QLD 4020, Australia
Abstract
Relation extraction from biological publications plays a pivotal role in accelerating scientific discovery and advancing medical research. While vast amounts of this knowledge is stored within the published literature, extracting it manually from this continually growing volume of documents is becoming increasingly arduous. Recently, attention has been focused towards automatically extracting such knowledge using pre-trained Large Language Models (LLM) and deep-learning algorithms for automated relation extraction. However, the complex syntactic structure of biological sentences, with nested entities and domain-specific terminology, and insufficient annotated training corpora, poses major challenges in accurately capturing entity relationships from the unstructured data. To address these issues, in this paper, we propose a Knowledge-based Intelligent Text Simplification (KITS) approach focused on the accurate extraction of biological relations. KITS is able to precisely and accurately capture the relational context among various binary relations within the sentence, alongside preventing any potential changes in meaning for those sentences being simplified by KITS. The experiments show that the proposed technique, using well-known performance metrics, resulted in a 21% increase in precision, with only 25% of sentences simplified in the Learning Language in Logic (LLL) dataset. Combining the proposed method with BioBERT, the popular pre-trained LLM was able to outperform other state-of-the-art methods.
Subject
Computer Networks and Communications,Human-Computer Interaction,Communication
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献