Automatically evaluating the quality of textual descriptions in cultural heritage records-Reference-Cited by-同舟云学术

Automatically evaluating the quality of textual descriptions in cultural heritage records

Published:2021-04-23 Issue:2 Volume:22 Page:217-231
ISSN:1432-5012
Container-title:International Journal on Digital Libraries
language:en
Short-container-title:Int J Digit Libr

Author:

Lorenzini Matteo,Rospocher Marco^ORCID,Tonelli Sara

Abstract

AbstractMetadata are fundamental for the indexing, browsing and retrieval of cultural heritage resources in repositories, digital libraries and catalogues. In order to be effectively exploited, metadata information has to meet some quality standards, typically defined in the collection usage guidelines. As manually checking the quality of metadata in a repository may not be affordable, especially in large collections, in this paper we specifically address the problem of automatically assessing the quality of metadata, focusing in particular on textual descriptions of cultural heritage items. We describe a novel approach based on machine learning that tackles this problem by framing it as a binary text classification task aimed at evaluating the accuracy of textual descriptions. We report our assessment of different classifiers using a new dataset that we developed, containing more than 100K descriptions. The dataset was extracted from different collections and domains from the Italian digital library “Cultura Italia” and was annotated with accuracy information in terms of compliance with the cataloguing guidelines. The results empirically confirm that our proposed approach can effectively support curators (F1

$$\sim $$

∼ 0.85) in assessing the quality of the textual descriptions of the records in their collections and provide some insights into how training data, specifically their size and domain, can affect classification performance.

Funder

Università degli Studi di Verona

Publisher

Springer Science and Business Media LLC

Subject

Library and Information Sciences

Link

https://link.springer.com/content/pdf/10.1007/s00799-021-00302-1.pdf

Reference39 articles.

1. Adankon, M.M., Cheriet, M.: Support Vector Machine, pp. 1303–1308. Springer US, Boston, MA (2009). https://doi.org/10.1007/978-0-387-73003-5_299

2. Aprosio, A.P., Moretti, G.: Tint 2.0: an all-inclusive suite for NLP in italian. In: Cabrio, E., Mazzei, A., Tamburini, F. (eds.) Proceedings of the Fifth Italian Conference on Computational Linguistics (CLiC-it 2018), Torino, Italy, December 10–12, 2018, CEUR Workshop Proceedings, vol. 2253. CEUR-WS.org (2018). URL http://ceur-ws.org/Vol-2253/paper58.pdf

3. Bizer, C., Cyganiak, R.: Quality-driven information filtering using the wiqa policy framework. Web Semant. 7(1), 1–10 (2009). https://doi.org/10.1016/j.websem.2008.02.005

4. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017). https://www.aclweb.org/anthology/Q17-1010/

5. Bruce, T.R., Hillmann, D.I.: The continuum of metadata quality: defining, expressing, exploiting. ALA editions (2004). https://ecommons.cornell.edu/handle/1813/7895

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MODELOS DE DIAGNÓSTICO DE QUALIDADE DE DADOS NO DOMÍNIO DO PATRIMÔNIO CULTURAL: UMA REVISÃO SISTEMÁTICA DE LITERATURA;Perspectivas em Ciência da Informação;2023