Unsupervised Estimation of Subjective Content Descriptions in an Information System
-
Published:2024-01-30
Issue:
Volume:
Page:1-25
-
ISSN:1793-351X
-
Container-title:International Journal of Semantic Computing
-
language:en
-
Short-container-title:Int. J. Semantic Computing
Author:
Bender Magnus1ORCID,
Braun Tanya2ORCID,
Möller Ralf1ORCID,
Gehrke Marcel1ORCID
Affiliation:
1. Institute of Information Systems, University of Lübeck, Ratzeburger Allee 160, D-23562 Lübeck, Germany
2. Computer Science Department, University of Münster, Einsteinstraße 62, D-48149 Münster, Germany
Abstract
Let us consider the following scenario: A human is working with a corpus of text documents. In this corpus, the human needs to know documents with similar content and highlight relevant locations in the retrieved documents. An information system displaying the contents of the corpus and providing an information retrieval agent will help the human. To perform information retrieval on the corpus, the agent used internally in the information system may need additional data associated with the documents. In order to support this, the so-called Subjective Content Descriptions (SCDs) provide additional location-specific data for text documents. SCDs are subjective in the sense that the agent associates data with sentences to reflect the beliefs of users. In our scenario, the agent needs SCDs referencing sentences of similar content across various documents in the corpus and most text documents are not associated with SCDs. Therefore, this paper presents UESM, the Unsupervised Estimator for SCD Matrices, an approach to associate any corpus with SCDs. In an evaluation, we show that the performance of UESM in estimating topics of similar content in the corpus is on par with Latent Dirichlet Allocation, while UESM provides the SCDs referencing sentences of similar content.
Funder
Understanding Written Artefacts: Material, Interaction and Transmission in Manuscript Cultures
Publisher
World Scientific Pub Co Pte Ltd
Subject
Artificial Intelligence,Computer Networks and Communications,Computer Science Applications,Linguistics and Language,Information Systems,Software