Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM-Reference-Cited by-同舟云学术

Interpretable Topic Extraction and Word Embedding Learning Using Non-Negative Tensor DEDICOM

Published:2021-01-19 Issue:1 Volume:3 Page:123-167
ISSN:2504-4990
Container-title:Machine Learning and Knowledge Extraction
language:en
Short-container-title:MAKE

Author:

Hillebrand Lars^ORCID,Biesner David^ORCID,Bauckhage Christian,Sifa Rafet

Abstract

Unsupervised topic extraction is a vital step in automatically extracting concise contentual information from large text corpora. Existing topic extraction methods lack the capability of linking relations between these topics which would further help text understanding. Therefore we propose utilizing the Decomposition into Directional Components (DEDICOM) algorithm which provides a uniquely interpretable matrix factorization for symmetric and asymmetric square matrices and tensors. We constrain DEDICOM to row-stochasticity and non-negativity in order to factorize pointwise mutual information matrices and tensors of text corpora. We identify latent topic clusters and their relations within the vocabulary and simultaneously learn interpretable word embeddings. Further, we introduce multiple methods based on alternating gradient descent to efficiently train constrained DEDICOM algorithms. We evaluate the qualitative topic modeling and word embedding performance of our proposed methods on several datasets, including a novel New York Times news dataset, and demonstrate how the DEDICOM algorithm provides deeper text analysis than competing matrix factorization approaches.

Funder

Bundesministerium für Bildung, Wissenschaft und Forschung

Publisher

MDPI AG

Subject

General Economics, Econometrics and Finance

Link

https://www.mdpi.com/2504-4990/3/1/7/pdf

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A detailed review on word embedding techniques with emphasis on word2vec;Multimedia Tools and Applications;2023-10-03

2. Special Issue “Selected Papers from CD-MAKE 2020 and ARES 2020”;Machine Learning and Knowledge Extraction;2023-01-20