An Open-Set Semi-Supervised Multi-Task Learning Framework for Context Classification in Biomedical Texts-Reference-Cited by-同舟云学术

An Open-Set Semi-Supervised Multi-Task Learning Framework for Context Classification in Biomedical Texts

Published:2024-07-23 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Tang Difei,Chow Tam Thomas Yu,Miskov-Zivanov Natasa

Abstract

AbstractIn biomedical research, knowledge about the relationship between entities, including genes, proteins, and drugs, is vital for unraveling the complexities of biological processes and mechanisms. Although text mining methods have recently demonstrated great success in biomedical relation extraction. However, such an extraction process often ignores context information like cell type, species, and anatomy, which are crucial components of biological knowledge. Moreover, existing methods addressing this problem as a text classification task are limited by the lack of labeled examples due to costly manual context annotations, which, although can achieve high precision, they perform poorly in unseen contexts. Additionally, despite some attempts to generate more examples automatically from the literature, these methods are often restricted to a fixed generation pattern. This study introduces an open-set semi-supervised multi-task learning framework for biomedical context classification in a practical setting. The proposed scheme assumes that the unlabeled data contains both in-distribution (ID) and out-of-distribution (OOD) examples. The main challenge in context classification is the limited data with sparse distribution across different context types. Therefore, we first build a large-scale context classification dataset using an automatic span annotation method by grounding two manually curated corpora. Next, we develop an outlier detector to properly distinguish the ID and OOD data. Moreover, to capture the inherent relationships between biomedical relations and their associated contexts, the context classification is treated as an individual task, and we design a multi-task (MTL) learning architecture that seamlessly integrates with the semi-supervised learning strategies during training. Extensive experiments on the context classification dataset demonstrate that the proposed method outperforms baselines and efficiently extracts context without requiring many manually annotated data for training.

Publisher

Cold Spring Harbor Laboratory

Reference23 articles.

1. Extracting inter-sentence relations for associating biological context with events in biomedical texts;IEEE/ACM Transactions on Computational Biology and Bioinformatics,2019

2. Large-scale automated machine reading discovers new cancer-driving mechanisms;Database,2018

3. Associating biological context with protein-protein interactions through text mining at PubMed scale;Journal of Biomedical Informatics,2023

4. STonKGs: a sophisticated transformer trained on biomedical text and knowledge graphs;Bioinformatics,2022

5. From word models to executable models of signaling networks using automated assembly