Latent semantic indexing (LSI) fails for TREC collections-Reference-Cited by-同舟云学术

Latent semantic indexing (LSI) fails for TREC collections

Published:2011-03-31 Issue:2 Volume:12 Page:5-10
ISSN:1931-0145
Container-title:ACM SIGKDD Explorations Newsletter
language:en
Short-container-title:SIGKDD Explor. Newsl.

Author:

Atreya Avinash¹,Elkan Charles¹

Affiliation:

1. University of California, San Diego

Abstract

The aim of latent semantic indexing (LSI) is to uncover the relationships between terms, hidden concepts, and documents. LSI uses the matrix factorization technique known as singular value decomposition (SVD). In this paper, we apply LSI to standard benchmark collections. We find that LSI yields poor retrieval accuracy on the TREC 2, 7, 8, and 2004 collections. We believe that the negative result is robust, because we try more LSI variants than any previous work. First, we show that using Okapi BM25 weights for terms in documents improves the performance of LSI. Second, we derive novel scoring methods that implement the ideas of query expansion and score regularization in the LSI framework. Third, we show how to combine the BM25 method with LSI methods. All proposed methods are evaluated experimentally on the four TREC collections mentioned above. The experiments show that the new variants of LSI improve upon previous LSI methods. Nevertheless, no way of using LSI achieves a worthwhile improvement in retrieval accuracy over BM25.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/1964897.1964900

Reference15 articles.

1. Improvements that don't add up

2. An empirical study of required dimensionality for large-scale latent semantic indexing applications

3. Evaluating evaluation measure stability

4. Indexing by latent semantic analysis

5. Regularizing ad hoc retrieval scores

Cited by 17 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Hierarchical Topical Modeling Approach for Recommending Repair of Quality Bugs;2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER);2023-03

2. Information Retrieval: Recent Advances and Beyond;IEEE Access;2023

3. Semantic Models for the First-Stage Retrieval: A Comprehensive Review;ACM Transactions on Information Systems;2022-03-24

4. A proposed conceptual framework for a representational approach to information retrieval;ACM SIGIR Forum;2021-12

5. Semantic Similarity of XML Documents Based on Structural and Content Analysis;Proceedings of the 2020 4th International Symposium on Computer Science and Intelligent Control;2020-11-17