Abstract
AbstractAutomatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80–90% precision and 70–80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.
Publisher
Cold Spring Harbor Laboratory
Reference25 articles.
1. The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text
2. S. Pyysalo , et al., “Sharing annotations better: RESTful Open Annotation,” Proc. ACL-IJCNLP, pp. 91–96, 2015.
3. Reflect: augmented browsing for the life scientist
4. E. Pafilis , et al., “EXTRACT: Interactive extraction of environment metadata and term suggestion for metagenomic sample annotation,” Proc. BioCreative Challenge Evaluation Workshop, pp. 384–395, 2015.
5. The gene normalization task in BioCreative III;BMC Bioinformatics,2011