Author:
Guan Zhengyi,Zhou Xiaobing
Abstract
AbstractBackgroundThe biomedical literature is growing rapidly, and it is increasingly important to extract meaningful information from the vast amount of literature. Biomedical named entity recognition (BioNER) is one of the key and fundamental tasks in biomedical text mining. It also acts as a primitive step for many downstream applications such as relation extraction and knowledge base completion. Therefore, the accurate identification of entities in biomedical literature has certain research value. However, this task is challenging due to the insufficiency of sequence labeling and the lack of large-scale labeled training data and domain knowledge.ResultsIn this paper, we use a novel word-pair classification method, design a simple attention mechanism and propose a novel architecture to solve the research difficulties of BioNER more efficiently without leveraging any external knowledge. Specifically, we break down the limitations of sequence labeling-based approaches by predicting the relationship between word pairs. Based on this, we enhance the pre-trained model BioBERT, through the proposed prefix and attention map dscrimination fusion guided attention and propose the E-BioBERT. Our proposed attention differentiates the distribution of different heads in different layers in the BioBERT, which enriches the diversity of self-attention. Our model is superior to state-of-the-art compared models on five available datasets: BC4CHEMD, BC2GM, BC5CDR-Disease, BC5CDR-Chem, and NCBI-Disease, achieving F1-score of 92.55%, 85.45%, 87.53%, 94.16% and 90.55%, respectively.ConclusionCompared with many previous various models, our method does not require additional training datasets, external knowledge, and complex training process. The experimental results on five BioNER benchmark datasets demonstrate that our model is better at mining semantic information, alleviating the problem of label inconsistency, and has higher entity recognition ability. More importantly, we analyze and demonstrate the effectiveness of our proposed attention.
Funder
National Natural Science Foundation of China
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献