One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition-Reference-Cited by-同舟云学术

One tagger, many uses: Illustrating the power of ontologies in dictionary-based named entity recognition

Published:2016-08-02 Issue: Volume: Page:
ISSN:
Container-title:
language:
Short-container-title:

Author:

Jensen Lars Juhl^ORCID

Abstract

AbstractAutomatic annotation of text is an important complement to manual annotation, because the latter is highly labour intensive. We have developed a fast dictionary-based named entity recognition (NER) system and addressed a wide variety of biomedical problems by applied it to text from many different sources. We have used this tagger both in real-time tools to support curation efforts and in pipelines for populating databases through bulk processing of entire Medline, the open-access subset of PubMed Central, NIH grant abstracts, FDA drug labels, electronic health records, and the Encyclopedia of Life. Despite the simplicity of the approach, it typically achieves 80–90% precision and 70–80% recall. Many of the underlying dictionaries were built from open biomedical ontologies, which further facilitate integration of the text-mining results with evidence from other sources.

Publisher

Cold Spring Harbor Laboratory

Reference25 articles.

1. The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text

2. S. Pyysalo , et al., “Sharing annotations better: RESTful Open Annotation,” Proc. ACL-IJCNLP, pp. 91–96, 2015.

3. Reflect: augmented browsing for the life scientist

4. E. Pafilis , et al., “EXTRACT: Interactive extraction of environment metadata and term suggestion for metagenomic sample annotation,” Proc. BioCreative Challenge Evaluation Workshop, pp. 384–395, 2015.

5. The gene normalization task in BioCreative III;BMC Bioinformatics,2011

Cited by 11 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Lifestyle factors in the biomedical literature: comprehensive resources for named entity recognition;2024-06-16

2. CoNECo: A Corpus for Named Entity recognition and normalization of protein Complexes;2024-05-21

3. RegulaTome: a corpus of typed, directed, and signed relations between biomedical entities in the scientific literature;2024-05-02

4. STRING-ing together protein complexes: extracting physical protein interactions from the literature;2023-12-11

5. Improving dictionary-based named entity recognition with deep learning;2023-12-11