BioCreAtIvE Task1A: entity identification with a stochastic tagger-Reference-Cited by-同舟云学术

BioCreAtIvE Task1A: entity identification with a stochastic tagger

Published:2005-05 Issue:S1 Volume:6 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Kinoshita Shuhei,Cohen K Bretonnel,Ogren Philip V,Hunter Lawrence

Abstract

Abstract Background Our approach to Task 1A was inspired by Tanabe and Wilbur's ABGene system [1, 2]. Like Tanabe and Wilbur, we approached the problem as one of part-of-speech tagging, adding a GENE tag to the standard tag set. Where their system uses the Brill tagger, we used TnT, the Trigrams 'n' Tags HMM-based part-of-speech tagger [3]. Based on careful error analysis, we implemented a set of post-processing rules to correct both false positives and false negatives. We participated in both the open and the closed divisions; for the open division, we made use of data from NCBI. Results Our base system without post-processing achieved a precision and recall of 68.0% and 77.2%, respectively, giving an F-measure of 72.3%. The full system with post-processing achieved a precision and recall of 80.3% and 80.5% giving an F-measure of 80.4%. We achieved a slight improvement (F-measure = 80.9%) by employing a dictionary-based post-processing step for the open division. We placed third in both the open and the closed division. Conclusion Our results show that a part-of-speech tagger can be augmented with post-processing rules resulting in an entity identification system that competes well with other approaches.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-6-S1-S4.pdf

Reference7 articles.

1. Tanabe L, Wilbur WJ: Tagging gene and protein names in biomedical text. Bioinformatics 2002, 18(8):1124–1132. 10.1093/bioinformatics/18.8.1124

2. Tanabe L, Wilbur WJ: Tagging gene and protein names in full text articles. Proceedings of the workshop on biomedical natural language processing in the biomedical domain Association for Computational Linguistics 2002, 9–13.

3. Brants T: TnT – A Statistical Part-of-Speech Tagger. Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP-2000)

4. Fukuda K, Tsunoda T, Tamura A, Takagi T: Toward information extraction: identifying protein names from biological papers. Pacific Symposium for Biocomputing 1998, 3: 705–716.

5. Fredrik O, Eriksson G, Franzén K, Asker L, Lidén P: Notions of correctness when evaluating protein name taggers. Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002) 765–771.

Cited by 18 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Automated recognition and mapping of building management system (BMS) data points for building energy modeling (BEM);Building Simulation;2020-03-20

2. Information theoretic-PSO-based feature selection: an application in biomedical entity extraction;Knowledge and Information Systems;2018-09-21

3. Coreference annotation and resolution in the Colorado Richly Annotated Full Text (CRAFT) corpus of biomedical journal articles;BMC Bioinformatics;2017-08-17

4. Feature selection for entity extraction from multiple biomedical corpora: A PSO-based approach;Soft Computing;2017-08-17

5. A framework for ontology-based question answering with application to parasite immunology;Journal of Biomedical Semantics;2015-07-17