Author:
Czarnecki Jan,Nobeli Irene,Smith Adrian M,Shepherd Adrian J
Abstract
Abstract
Background
Increasingly biological text mining research is focusing on the extraction of complex relationships relevant to the construction and curation of biological networks and pathways. However, one important category of pathway — metabolic pathways — has been largely neglected.
Here we present a relatively simple method for extracting metabolic reaction information from free text that scores different permutations of assigned entities (enzymes and metabolites) within a given sentence based on the presence and location of stemmed keywords. This method extends an approach that has proved effective in the context of the extraction of protein–protein interactions.
Results
When evaluated on a set of manually-curated metabolic pathways using standard performance criteria, our method performs surprisingly well. Precision and recall rates are comparable to those previously achieved for the well-known protein-protein interaction extraction task.
Conclusions
We conclude that automated metabolic pathway construction is more tractable than has often been assumed, and that (as in the case of protein–protein interaction extraction) relatively simple text-mining approaches can prove surprisingly effective. It is hoped that these results will provide an impetus to further research and act as a useful benchmark for judging the performance of more sophisticated methods that are yet to be developed.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference49 articles.
1. Kim J, Ohta T, Pyysalo S, Kano Y, Tsujii J: Overview of BioNLP’09 Shared Task on Event Extraction. Proceedings of the BioNLP 2009 Workshop Companion Volume for Shared Task. 2009, Boulder, Colorado: Association for Computational Linguistics, 1-9. [http://www.aclweb.org/anthology-new/W/W09/W09-1401.bib],
2. Blaschke C, Valencia A: The Frame-Based Module of the SUISEKI Information Extraction System. IEEE Intelligent Systems. 2002, 17: 14-20. [http://portal.acm.org/citation.cfm?id=630323.630717],
3. Iossifov I, Krauthammer M, Friedman C, Hatzivassiloglou V, Bader JS, White KP, Rzhetsky A: Probabilistic inference of molecular networks from noisy data sources. Bioinformatics. 2004, 20 (8): 1205-1213. 10.1093/bioinformatics/bth061. [http://dx.doi.org/10.1093/bioinformatics/bth061],
4. Rzhetsky A, Iossifov I, Koike T, Krauthammer M, Kra P, Morris M, Yu H, Duboué PA, Weng W, Wilbur WJ, Hatzivassiloglou V, Friedman C: GeneWays: a system for extracting, analyzing, visualizing, and integrating molecular pathway data. J Biomed Inform. 2004, 37: 43-53. 10.1016/j.jbi.2003.10.001. [http://dx.doi.org/10.1016/j.jbi.2003.10.001],
5. Santos C, Eggle D, States DJ: Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction. Bioinformatics. 2005, 21 (8): 1653-1658. 10.1093/bioinformatics/bti165. [http://dx.doi.org/10.1093/bioinformatics/bti165],
Cited by
31 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献