Author:
Nguyen Quang Long,Tikk Domonkos,Leser Ulf
Abstract
Abstract
Background
Pattern-based approaches to relation extraction have shown very good results in many areas of biomedical text mining. However, defining the right set of patterns is difficult; approaches are either manual, incurring high cost, or automatic, often resulting in large sets of noisy patterns.
Results
We propose several techniques for filtering sets of automatically generated patterns and analyze their effectiveness for different extraction tasks, as defined in the recent BioNLP 2009 shared task. We focus on simple methods that only take into account the complexity of the pattern and the complexity of the texts the patterns are applied to. We show that our techniques, despite their simplicity, yield large improvements in all tasks we analyzed. For instance, they raise the F-score for the task of extraction gene expression events from 24.8% to 51.9%.
Conclusions
Already very simple filtering techniques may improve the F-score of an information extraction method based on automatically generated patterns significantly. Furthermore, the application of such methods yields a considerable speed-up, as fewer matches need to be analysed. Due to their simplicity, the proposed filtering techniques also should be applicable to other methods using linguistic patterns for information extraction.
Publisher
Springer Science and Business Media LLC
Subject
Computer Networks and Communications,Health Informatics,Computer Science Applications,Information Systems
Reference33 articles.
1. Cohen AM, Hersh WR: A survey of current work in biomedical text mining. Brief Bioinform. 2005, 6: 57-71. 10.1093/bib/6.1.57.
2. Zweigenbaum P, Demner-Fushman D, Yu H, Cohen KB: Frontiers of biomedical text mining: current progress. Brief Bioinform. 2007, 8: 358-375. 10.1093/bib/bbm045.
3. Kao A, Poteet S: Natural Language Processing and Text Mining. 2006, Springer Verlag
4. Chaussabel D, Sher A: Mining microarray expression data by literature profiling. Genome Biol. 2002, 3: research0055-10.1186/gb-2002-3-10-research0055.
5. Stapley BJ, Benoit G: Biobibliometrics: information retrieval and visualization from co-occurrences of gene names in Medline abstracts. Pac Symp Biocomput. 2000, 529-540.
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献