Abstract
AbstractEntities play an essential role in understanding textual documents, regardless of whether the documents are short, such as tweets, or long, such as news articles. In short textual documents, all entities mentioned are usually considered equally important because of the limited amount of information. In long textual documents, however, not all entities are equally important: some are salient and others are not. Traditional entity topic models (ETMs) focus on ways to incorporate entity information into topic models to better explain the generative process of documents. However, entities are usually treated equally, without considering whether they are salient or not. In this work, we propose a novel ETM, Salient Entity Topic Model, to take salient entities into consideration in the document generation process. In particular, we model salient entities as a source of topics used to generate words in documents, in addition to the topic distribution of documents used in traditional topic models. Qualitative and quantitative analysis is performed on the proposed model. Application to entity salience detection demonstrates the effectiveness of our model compared to the state-of-the-art topic model baselines.
Publisher
Cambridge University Press (CUP)
Subject
Artificial Intelligence,Linguistics and Language,Language and Linguistics,Software
Cited by
4 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. From Text to Context: An Entailment Approach for News Stakeholder Classification;Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval;2024-07-10
2. Towards Self-Contained Answers: Entity-Based Answer Rewriting in Conversational Search;Proceedings of the 2024 ACM SIGIR Conference on Human Information Interaction and Retrieval;2024-03-10
3. An unsupervised perplexity-based method for boilerplate removal;Natural Language Engineering;2023-02-21
4. MultiLayerET: A Unified Representation of Entities and Topics Using Multilayer Graphs;Machine Learning and Knowledge Discovery in Databases;2023