Probabilistic Latent Semantic Indexing-Reference-Cited by-同舟云学术

Probabilistic Latent Semantic Indexing

Published:2017-08-02 Issue:2 Volume:51 Page:211-218
ISSN:0163-5840
Container-title:ACM SIGIR Forum
language:en
Short-container-title:SIGIR Forum

Author:

Hofmann Thomas¹

Affiliation:

1. International Computer Science Institute, Berkeley, CA & EECS Department, CS Division, UC Berkeley

Abstract

Probabilistic Latent Semantic Indexing is a novel approach to automated document indexing which is based on a statistical latent class model for factor analysis of count data. Fitted from a training corpus of text documents by a generalization of the Expectation Maximization algorithm, the utilized model is able to deal with domain{specific synonymy as well as with polysemous words. In contrast to standard Latent Semantic Indexing (LSI) by Singular Value Decomposition, the probabilistic variant has a solid statistical foundation and defines a proper generative data model. Retrieval experiments on a number of test collections indicate substantial performance gains over direct term matching methods as well as over LSI. In particular, the combination of models with different dimensionalities has proven to be advantageous.

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Management Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/3130348.3130370

Reference16 articles.

1. Max- imum likelihood from incomplete data via the EM algorithm;Dempster A.;J. Royal Statist. Soc. B,1977

Cited by 105 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. MLTracer: An Approach Based on Multi-Layered Gradient Boosting Decision Trees for Requirements Traceability Recovery;2024 International Joint Conference on Neural Networks (IJCNN);2024-06-30

2. KD SENSO-MERGER: An architecture for semantic integration of heterogeneous data;Engineering Applications of Artificial Intelligence;2024-06

3. CRAS: cross-domain recommendation via aspect-level sentiment extraction;Knowledge and Information Systems;2024-05-18

4. Fault Diagnosis Method for Railway Signal Equipment Based on Data Enhancement and an Improved Attention Mechanism;Machines;2024-05-13

5. Visualization and Analysis of Social Network-Based Diverse Datasets Using Multi-Viewpoint Similarity Metrics;2024 International Conference on Expert Clouds and Applications (ICOECA);2024-04-18