Affiliation:
1. Institute for Applied Mathematics and Information Technologies, National Research Council of Italy (IMATI—CNR), 20133 Milan, Italy
Abstract
When integrating data from different sources, there are problems of synonymy, different languages, and concepts of different granularity. This paper proposes a simple yet effective approach to evaluate the semantic similarity of short texts, especially keywords. The method is capable of matching keywords from different sources and languages by exploiting transformers and WordNet-based methods. Key features of the approach include its unsupervised pipeline, mitigation of the lack of context in keywords, scalability for large archives, support for multiple languages and real-world scenarios adaptation capabilities. The work aims to provide a versatile tool for different cultural heritage archives without requiring complex customization. The paper aims to explore different approaches to identifying similarities in 1- or n-gram tags, evaluate and compare different pre-trained language models, and define integrated methods to overcome limitations. Tests to validate the approach have been conducted using the QueryLab portal, a search engine for cultural heritage archives, to evaluate the proposed pipeline.
Subject
Artificial Intelligence,Computer Science Applications,Information Systems,Management Information Systems
Reference40 articles.
1. Manning, C.D., Raghavan, P., and Schütze, H. (2008). Introduction to Information Retrieval, Cambridge University Press.
2. Van Rijsbergen, C. (1979). Information Retrieval, Butterworths.
3. A statistical interpretation of term specificity and its application in retrieval;Jones;J. Doc.,2004
4. Artese, M.T., and Gagliardi, I. (2022). Integrating, Indexing and Querying the Tangible and Intangible Cultural Heritage Available Online: The QueryLab Portal. Information, 13.
5. A Comprehensive Comparative Study of Word and Sentence Similarity Measures;Atoum;Int. J. Comput. Appl.,2016
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献