Abstract
PurposeThe purpose of this paper is to propose a methodology for the enrichment and tailoring of a knowledge organization system (KOS), in order to support the information extraction (IE) task for the analysis of documents in the tourism domain. In particular, the KOS is used to develop a named entity recognition (NER) system.Design/methodology/approachA method to improve and customize an available thesaurus by leveraging documents related to the tourism in Italy is firstly presented. Then, the obtained thesaurus is used to create an annotated NER corpus, exploiting both distant supervision, deep learning and a light human supervision.FindingsThe study shows that a customized KOS can effectively support IE tasks when applied to documents belonging to the same domains and types used for its construction. Moreover, it is very useful to support and ease the annotation task using the proposed methodology, allowing to annotate a corpus with a fraction of the effort required for a manual annotation.Originality/valueThe paper explores an alternative use of a KOS, proposing an innovative NER corpus annotation methodology. Moreover, the KOS and the annotated NER data set will be made publicly available.
Subject
Library and Information Sciences,Information Systems
Reference62 articles.
1. A semiautomatic annotation approach for sentiment analysis;Journal of Information Science,2021
2. Unsupervised entity and relation extraction from clinical records in Italian;Computers in Biology and Medicine,2016
3. KIRA: a system for knowledge-based access to multimedia art collections,2017
4. Annotation and extraction of relations from Italian medical records,2015
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献