Abstract
AbstractThe extraction of information from Dutch archaeological grey literature has recently been investigated by the AGNES project. AGNES aims to disclose relevant information by means of a web search engine, to enable researchers to search through excavation reports. In this paper, we focus on the multi-labelling of archaeological excavation reports with time periods and site types, and provide a manually labelled reference set to this end. We propose a series of approaches, pre-processing methods, and various modifications of the training set to address the often low quality of both texts and labels. We find that despite those issues, our proposed methods lead to promising results.
Publisher
Springer Science and Business Media LLC
Subject
Library and Information Sciences,Linguistics and Language,Education,Language and Linguistics
Reference28 articles.
1. Branco, P., Torgo, L., & Ribeiro, R. (2015). A survey of predictive modelling under imbalanced distributions.
2. Brandsen, A., & Koole, M. (2020). Alexbrandsen/archaeo-labelling-gold-standard: First version (Version v1.0.). Zenodo. https://doi.org/10.5281/zenodo.4115747
3. Brandsen, A., Lambers, K., Verberne, S., & Wansleeben, M. (2019). User requirement solicitation for an information retrieval system applied to Dutch grey literature in the archaeology domain. Journal of Computer Applications in Archaeology, 2(1), 21–30. https://doi.org/10.5334/jcaa.33
4. Brandsen, A., Verberne, S., Lambers, K., & Wansleeben, M. (2021). Usability evaluation for online professional search in the Dutch archaeology domain. arXiv. http://arxiv.org/abs/2103.04437
5. Brandsen, A., Verberne, S., Wansleeben, M., & Lambers, K. (2020). Creating a dataset for named entity recognition in the archaeology domain. In Proceedings of The 12th Language Resources and Evaluation Conference (pp. 4573–4577). https://doi.org/10.5281/zenodo.3544544
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献