Affiliation:
1. College of Computer and Information Engineering, Inner Mongolia Agricultural University, Hohhot 010018, P. R. China
2. China College of Veterinary Medicine, Inner Mongolia Agricultural University, Hohhot 010018, P. R. China
Abstract
Utilizing deep learning for data mining in biomedicine often falls short in leveraging prior knowledge and adapting to the complexities of biomedical literature mining. Entity recognition, a fundamental task in information extraction, also provides data support for Natural Language Processing (NLP) downstream tasks. Bovine Viral Diarrhea Virus (BVDV) results in considerable economic losses in the cattle industry due to calf diarrhea, bovine respiratory syndrome, and cow abortion. This study aims to extract information on BVDV from relevant literature and build a knowledge base. It enhances feature extraction in the BioBERT pre-trained model using the Machine Reading Comprehension (MRC) framework for information fusion and bi-directionally extracts corpus information through the Bi-LSTM network, followed by a CRF layer for decoding and prediction. The results show the construction of a BVDV Corpus with 22 biomedical entities and introduce the BioBERT-Bi-LSTM-CRF Integrated with MRC (BBCM) model for Named Entity Recognition (NER), combining prior knowledge and the reading comprehension framework (MRC). The BBCM model achieves [Formula: see text]-scores of 78.79% and 76.3% on the public datasets JNLPBA and GENIA, respectively, and 67.52% on the BVDV Corpus, outperforming other models. This research presents a targeted NER method for BVDV, effectively identifying related entities and exploring their relationships, thus providing valuable data support for NLP’s downstream tasks.
Funder
Inner Mongolia University of Science and Technology
Education Department of Inner Mongolia Autonomous Region
the Natural Science Foundation of Inner Mongolia of China under Grant
Publisher
World Scientific Pub Co Pte Ltd