Gene/protein name recognition based on support vector machine using dictionary as features-Reference-Cited by-同舟云学术

Gene/protein name recognition based on support vector machine using dictionary as features

Published:2005-05 Issue:S1 Volume:6 Page:
ISSN:1471-2105
Container-title:BMC Bioinformatics
language:en
Short-container-title:BMC Bioinformatics

Author:

Mitsumori Tomohiro,Fation Sevrani,Murata Masaki,Doi Kouichi,Doi Hirohumi

Abstract

Abstract Background Automated information extraction from biomedical literature is important because a vast amount of biomedical literature has been published. Recognition of the biomedical named entities is the first step in information extraction. We developed an automated recognition system based on the SVM algorithm and evaluated it in Task 1.A of BioCreAtIvE, a competition for automated gene/protein name recognition. Results In the work presented here, our recognition system uses the feature set of the word, the part-of-speech (POS), the orthography, the prefix, the suffix, and the preceding class. We call these features "internal resource features", i.e., features that can be found in the training data. Additionally, we consider the features of matching against dictionaries to be external resource features. We investigated and evaluated the effect of these features as well as the effect of tuning the parameters of the SVM algorithm. We found that the dictionary matching features contributed slightly to the improvement in the performance of the f-score. We attribute this to the possibility that the dictionary matching features might overlap with other features in the current multiple feature setting. Conclusion During SVM learning, each feature alone had a marginally positive effect on system performance. This supports the fact that the SVM algorithm is robust on the high dimensionality of the feature vector space and means that feature selection is not required.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Link

https://link.springer.com/content/pdf/10.1186/1471-2105-6-S1-S8.pdf

Reference18 articles.

1. Ono T, Hishigaki H, Tanigami A, Takagi T: Automatic extraction of information on protein-protein interactions from biomedical literature. Bioinformatics 2001, 17(2):155–161. 10.1093/bioinformatics/17.2.155

2. Blaschke C, Valencia A: Can bibliographic pointers for known biological data be found automatically? Protein interactions as a case study. Comparative and Functional Genomics 2001, 2: 196–206. 10.1002/cfg.91

3. Temkin JM, Gilder MR: Extraction of protein interaction information from unstructured text using a context-free grammar. Bioinformatics 2003, 19(16):2046–2053. 10.1093/bioinformatics/btg279

4. Fukuda K, Tamura A, Tsunoda T, Takagi T: Toward Information Extraction: Identifying protein names from biological papers. Proceedings of the Pacific Symposium on Biocomputing 1998, 707–718.

5. Franzén K, Eriksson G, Asker FOL, Lidén P, Cöster J: Protein names and how to find them. International Journal of Medical Informatics 2002, 67: 49–61. 10.1016/S1386-5056(02)00052-7

Cited by 39 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. From zero to hero: Harnessing transformers for biomedical named entity recognition in zero- and few-shot contexts;Artificial Intelligence in Medicine;2024-10

2. From Zero to Hero: Harnessing Transformers for Biomedical Named Entity Recognition in Zero- and Few-Shot Contexts;2023

3. Improving the co-training algorithm to enhance semi-supervised learning results;2022 IEEE International Conference on Big Data (Big Data);2022-12-17

4. ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition;Complexity;2021-03-13

5. A Support Vector Machine Learning for the Upward and Downward Tendency Theory of Traditional Chinese Medicine;2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM);2020-12-16