Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations-Reference-Cited by-同舟云学术

Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations

Published:2015-01-19 Issue:S1 Volume:7 Page:
ISSN:1758-2946
Container-title:Journal of Cheminformatics
language:en
Short-container-title:J Cheminform

Author:

Munkhdalai Tsendsuren,Li Meijing,Batsuren Khuyagbaatar,Park Hyeon Ah,Choi Nak Hyeon,Ryu Keun Ho

Abstract

Abstract Background Chemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature. We present a semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance. The proposed method includes Natural Language Processing (NLP) tasks for text preprocessing, learning word representation features from a large amount of text data for feature extraction, and conditional random fields for token classification. Other than the free text in the domain, the proposed method does not rely on any lexicon nor any dictionary in order to keep the system applicable to other NER tasks in bio-text data. Results We extended BANNER, a biomedical NER system, with the proposed method. This yields an integrated system that can be applied to chemical and drug NER or biomedical NER. We call our branch of the BANNER system BANNER-CHEMDNER, which is scalable over millions of documents, processing about 530 documents per minute, is configurable via XML, and can be plugged into other systems by using the BANNER Unstructured Information Management Architecture (UIMA) interface. BANNER-CHEMDNER achieved an 85.68% and an 86.47% F-measure on the testing sets of CHEMDNER Chemical Entity Mention (CEM) and Chemical Document Indexing (CDI) subtasks, respectively, and achieved an 87.04% F-measure on the official testing set of the BioCreative II gene mention task, showing remarkable performance in both chemical and biomedical NER. BANNER-CHEMDNER system is available at: https://bitbucket.org/tsendeemts/banner-chemdner.

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Computer Graphics and Computer-Aided Design,Physical and Theoretical Chemistry,Computer Science Applications

Link

http://link.springer.com/content/pdf/10.1186/1758-2946-7-S1-S9.pdf

Reference34 articles.

1. Dai HJ, Chang YC, Tsai RTH, Hsu WL: New Challenges for Biological Text-Mining in the Next Decade. Journal of computer science and technology. 2010, 25: 169-179. 10.1007/s11390-010-9313-5.

2. Rocktäschel T, Weidlich M, Leser U: ChemSpot: a hybrid system for chemical named entity recognition. Bioinformatics. 2012, 28: 1633-1640. 10.1093/bioinformatics/bts183.

3. Hettne KM, Stierum RH, Schuemie MJ, Hendriksen PJM, Schijvenaars BJA, Mulligen EMV, Kleinjans J, Kors JA: A dictionary to identify small molecules and drugs in free text. Bioinformatics. 2009, 25: 2983-2991. 10.1093/bioinformatics/btp535.

4. Segura-Bedmar I, Martínez P, Segura-Bedmar M: Drug name recognition and classification in biomedical texts: A case study outlining approaches underpinning automated systems. Drug Discovery Today. 2008, 13: 816-823. 10.1016/j.drudis.2008.06.001.

5. Zhao S: Named Entity Recognition in Biomedical Texts using an HMM Model. Proceedings of the Joint Workshop on Natural Language Processing in Biomedicine and its Applications. Edited by: Nigel C. National Institute of Informatics, Patrick R. University Hospital of Geneva and EPFL, Adeline N. LIPN. 2004, 84-87.

Cited by 30 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Drug–Drug Interaction Relation Extraction Based on Deep Learning: A Review;ACM Computing Surveys;2024-03-13

2. Cancer hallmark analysis using semantic classification with enhanced topic modelling on biomedical literature;Multimedia Tools and Applications;2024-02-17

3. Machine Reading Comprehension Model in Domain-Transfer Task;Lobachevskii Journal of Mathematics;2023-08

4. Learning adaptive representations for entity recognition in the biomedical domain;Journal of Biomedical Semantics;2021-05-17

5. Named Entity Recognition and Relation Detection for Biomedical Information Extraction;Frontiers in Cell and Developmental Biology;2020-08-28