An extensive review of tools for manual annotation of documents-Reference-Cited by-同舟云学术

An extensive review of tools for manual annotation of documents

Published:2019-12-15 Issue:1 Volume:22 Page:146-163
ISSN:1477-4054
Container-title:Briefings in Bioinformatics
language:en
Short-container-title:

Author:

Neves Mariana¹,Ševa Jurica¹

Affiliation:

1. German Centre for the Protection of Laboratory Animals (BfR), German Federal Institute for Risk Assessment (BfR), Berlin, Germany

Abstract

Abstract Motivation Annotation tools are applied to build training and test corpora, which are essential for the development and evaluation of new natural language processing algorithms. Further, annotation tools are also used to extract new information for a particular use case. However, owing to the high number of existing annotation tools, finding the one that best fits particular needs is a demanding task that requires searching the scientific literature followed by installing and trying various tools. Methods We searched for annotation tools and selected a subset of them according to five requirements with which they should comply, such as being Web-based or supporting the definition of a schema. We installed the selected tools (when necessary), carried out hands-on experiments and evaluated them using 26 criteria that covered functional and technical aspects. We defined each criterion on three levels of matches and a score for the final evaluation of the tools. Results We evaluated 78 tools and selected the following 15 for a detailed evaluation: BioQRator, brat, Catma, Djangology, ezTag, FLAT, LightTag, MAT, MyMiner, PDFAnno, prodigy, tagtog, TextAE, WAT-SL and WebAnno. Full compliance with our 26 criteria ranged from only 9 up to 20 criteria, which demonstrated that some tools are comprehensive and mature enough to be used on most annotation projects. The highest score of 0.81 was obtained by WebAnno (of a maximum value of 1.0).

Publisher

Oxford University Press (OUP)

Subject

Molecular Biology,Information Systems

Link

http://academic.oup.com/bib/article-pdf/22/1/146/35934686/bbz130.pdf

Reference104 articles.

1. et al. Automatic semantic classification of scientific literature according to the hallmarks of cancer;Baker;Bioinformatics,2016

2. et al. Deep learning with word embeddings improves biomedical named entity recognition;Habibi;Bioinformatics,2017

3. et al. Deep learning of mutation-gene-drug relations from the literature;Lee;BMC Bioinform,2018

4. Corpora for the conceptualisation and zoning of scientific papers. In: Calzolari N (Conference Chair), Choukri K, Maegaard B, Mariani J, Odijk J, Piperidis S, Rosner M and Tapias D (eds). Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10), Valletta, Malta, May 2010;Liakata;European Language Resources Association (ELRA)

Cited by 71 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. LLM-AIx: An open source pipeline for Information Extraction from unstructured medical text based on privacy preserving Large Language Models;2024-09-03

2. LSD600: the first corpus of biomedical abstracts annotated with lifestyle–disease relations;2024-08-31

3. Identifying signs and symptoms of urinary tract infection from emergency department clinical notes using large language models;Academic Emergency Medicine;2024-04-03

4. Innovative agricultural ontology construction using NLP methodologies and graph neural network;Engineering Science and Technology, an International Journal;2024-04

5. MetaTron: advancing biomedical annotation empowering relation annotation and collaboration;BMC Bioinformatics;2024-03-14