Author:
Li Meijing,Yang Hao,Liu Yuxin
Abstract
BACKGROUND: With the exponential increase in the volume of biomedical literature, text mining tasks are becoming increasingly important in the medical domain. Named entities are the primary identification tasks in text mining, prerequisites and critical parts for building medical domain knowledge graphs, medical question and answer systems, medical text classification. OBJECTIVE: The study goal is to recognize biomedical entities effectively by fusing multi-feature embedding. Multiple features provide more comprehensive information so that better predictions can be obtained. METHODS: Firstly, three different kinds of features are generated, including deep contextual word-level features, local char-level features, and part-of-speech features at the word representation layer. The word representation vectors are inputs into BiLSTM as features to obtain the dependency information. Finally, the CRF algorithm is used to learn the features of the state sequences to obtain the global optimal tagging sequences. RESULTS: The experimental results showed that the model outperformed other state-of-the-art methods for all-around performance in six datasets among eight of four biomedical entity types. CONCLUSION: The proposed method has a positive effect on the prediction results. It comprehensively considers the relevant factors of named entity recognition because the semantic information is enhanced by fusing multi-features embedding.
Subject
Health Informatics,Biomedical Engineering,Information Systems,Biomaterials,Bioengineering,Biophysics
Reference54 articles.
1. Kocaman V, Talby D. Biomedical named entity recognition at scale//International Conference on Pattern Recognition. Springer, Cham, 2021; 635-646.
2. Deep learning methods for biomedical named entity recognition: a survey and qualitative comparison;Song;Briefings in Bioinformatics.,2021
3. Wang Y, Tong H, Zhu Z, et al. Nested Named Entity Recognition: A Survey. ACM Transactions on Knowledge Discovery from Data (TKDD), 2022.
4. Muralikrishnan RK, Gopalakrishna P, Sugumaran V. Biomedical Named Entity Recognition (NER) for Chemical-Protein Interactions. 2021.
5. Bonner S, Barrett IP, Ye C, et al. A review of biomedical datasets relating to drug discovery: A knowledge graph perspective. arXiv preprint arXiv2102.10062, 2021.