Multiple annotation for biodiversity: developing an annotation framework among biology, linguistics and text technology-Reference-Cited by-同舟云学术

Multiple annotation for biodiversity: developing an annotation framework among biology, linguistics and text technology

Published:2021-08-04 Issue: Volume: Page:
ISSN:1574-020X
Container-title:Language Resources and Evaluation
language:en
Short-container-title:Lang Resources & Evaluation

Author:

Lücking Andy^ORCID,Driller Christine^ORCID,Stoeckel Manuel,Abrami Giuseppe^ORCID,Pachzelt Adrian^ORCID,Mehler Alexander

Abstract

AbstractBiodiversity information is contained in countless digitized and unprocessed scholarly texts. Although automated extraction of these data has been gaining momentum for years, there are still innumerable text sources that are poorly accessible and require a more advanced range of methods to extract relevant information. To improve the access to semantic biodiversity information, we have launched the BIOfid project (www.biofid.de) and have developed a portal to access the semantics of German language biodiversity texts, mainly from the 19th and 20th century. However, to make such a portal work, a couple of methods had to be developed or adapted first. In particular, text-technological information extraction methods were needed, which extract the required information from the texts. Such methods draw on machine learning techniques, which in turn are trained by learning data. To this end, among others, we gathered the bio text corpus, which is a cooperatively built resource, developed by biologists, text technologists, and linguists. A special feature of bio is its multiple annotation approach, which takes into account both general and biology-specific classifications, and by this means goes beyond previous, typically taxon- or ontology-driven proper name detection. We describe the design decisions and the genuine Annotation Hub Framework underlying the bio annotations and present agreement results. The tools used to create the annotations are introduced, and the use of the data in the semantic portal is described. Finally, some general lessons, in particular with multiple annotation projects, are drawn.

Funder

Deutsche Forschungsgemeinschaft

Johann Wolfgang Goethe-Universität, Frankfurt am Main

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences,Linguistics and Language,Education,Language and Linguistics

Link

https://link.springer.com/content/pdf/10.1007/s10579-021-09553-5.pdf

Reference72 articles.

1. Abrami, G., & Mehler, A. (2018). A UIMA database interface for managing NLP-related text annotations. In Proceedings of the 11th edition of the Language Resources and Evaluation Conference (LREC 2018), 7–12 May 2018, Miyazaki, Japan.

2. Abrami, G., Mehler, A., Lücking, A., Rieb, E., & Helfrich, P. (2019). TextAnnotator: A flexible framework for semantic annotations. In Proceedings of the Fifteenth Joint ACL - ISO Workshop on Interoperable Semantic Annotation (ISA-15).

3. Abrami, G., Mehler, A., & Stoeckel, M. (2020). TextAnnotator: A web-based annotation suite for texts. In Proceedings of the Digital Humanities 2020 (DH 2020). https://doi.org/10.17613/tenm-4907, https://dh2020.adho.org/wp-content/uploads/2020/07/547_TextAnnotatorAwebbasedannotationsuitefortexts.html.

4. Ahmed, S., Stoeckel, M., Driller, C., Pachzelt, A., & Mehler, A. (2019). Biofid dataset: Publishing a german gold standard for named entity recognition in historical biodiversity literature. In Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL). Association for Computational Linguistics.

5. Akella, L. M., Norton, C. N., & Miller, H. (2012). NetiNeti: discovery of scientific names from text using machine learning methods. BMC Bioinformatics, 13, 211. https://doi.org/10.1186/1471-2105-13-211.

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. The Soil Food Web Ontology: Aligning trophic groups, processes, resources, and dietary traits to support food-web research;Ecological Informatics;2023-12

2. Formalizing Invertebrate Morphological Data: A Descriptive Model for Cuticle-Based Skeleto-Muscular Systems, an Ontology for Insect Anatomy, and their Potential Applications in Biodiversity Research and Informatics;Systematic Biology;2023-04-24

3. The Soil Food Web Ontology: aligning trophic groups, processes, resources, and dietary traits to support food-web research;2023-02-03

4. Mobilizing and Enhancing Legacy Biodiversity Data: The case of Karl Wilhelm Verhoeff's correspondence;Biodiversity Information Science and Standards;2022-08-23

5. BIOfid Steps Up to Provide Introduced Species Information: The case of myriapods in German greenhouses;Biodiversity Information Science and Standards;2022-08-23