Author:
Kinoshita Shuhei,Cohen K Bretonnel,Ogren Philip V,Hunter Lawrence
Abstract
Abstract
Background
Our approach to Task 1A was inspired by Tanabe and Wilbur's ABGene system [1, 2]. Like Tanabe and Wilbur, we approached the problem as one of part-of-speech tagging, adding a GENE tag to the standard tag set. Where their system uses the Brill tagger, we used TnT, the Trigrams 'n' Tags HMM-based part-of-speech tagger [3]. Based on careful error analysis, we implemented a set of post-processing rules to correct both false positives and false negatives. We participated in both the open and the closed divisions; for the open division, we made use of data from NCBI.
Results
Our base system without post-processing achieved a precision and recall of 68.0% and 77.2%, respectively, giving an F-measure of 72.3%. The full system with post-processing achieved a precision and recall of 80.3% and 80.5% giving an F-measure of 80.4%. We achieved a slight improvement (F-measure = 80.9%) by employing a dictionary-based post-processing step for the open division. We placed third in both the open and the closed division.
Conclusion
Our results show that a part-of-speech tagger can be augmented with post-processing rules resulting in an entity identification system that competes well with other approaches.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference7 articles.
1. Tanabe L, Wilbur WJ: Tagging gene and protein names in biomedical text. Bioinformatics 2002, 18(8):1124–1132. 10.1093/bioinformatics/18.8.1124
2. Tanabe L, Wilbur WJ: Tagging gene and protein names in full text articles. Proceedings of the workshop on biomedical natural language processing in the biomedical domain Association for Computational Linguistics 2002, 9–13.
3. Brants T: TnT – A Statistical Part-of-Speech Tagger. Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP-2000)
4. Fukuda K, Tsunoda T, Tamura A, Takagi T: Toward information extraction: identifying protein names from biological papers. Pacific Symposium for Biocomputing 1998, 3: 705–716.
5. Fredrik O, Eriksson G, Franzén K, Asker L, Lidén P: Notions of correctness when evaluating protein name taggers. Proceedings of the 19th International Conference on Computational Linguistics (COLING 2002) 765–771.
Cited by
18 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献