Concept-Based Information Retrieval Using Explicit Semantic Analysis-Reference-Cited by-同舟云学术

Concept-Based Information Retrieval Using Explicit Semantic Analysis

Published:2011-04 Issue:2 Volume:29 Page:1-34
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Egozi Ofer¹,Markovitch Shaul¹,Gabrilovich Evgeniy¹

Affiliation:

1. Technion---Israel Institute of Technology

Abstract

Information retrieval systems traditionally rely on textual keywords to index and retrieve documents. Keyword-based retrieval may return inaccurate and incomplete results when different keywords are used to describe the same concept in the documents and in the queries. Furthermore, the relationship between these related keywords may be semantic rather than syntactic, and capturing it thus requires access to comprehensive human world knowledge. Concept-based retrieval methods have attempted to tackle these difficulties by using manually built thesauri, by relying on term cooccurrence data, or by extracting latent word relationships and concepts from a corpus. In this article we introduce a new concept-based retrieval approach based on Explicit Semantic Analysis (ESA), a recently proposed method that augments keyword-based text representation with concept-based features, automatically extracted from massive human knowledge repositories such as Wikipedia. Our approach generates new text features automatically, and we have found that high-quality feature selection becomes crucial in this setting to make the retrieval more focused. However, due to the lack of labeled data, traditional feature selection methods cannot be used, hence we propose new methods that use self-generated labeled training data. The resulting system is evaluated on several TREC datasets, showing superior performance over previous state-of-the-art results.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/1961209.1961211

Reference71 articles.

1. The ESA retrieval model revisited

2. A study of query length

3. Improvements that don't add up

Cited by 132 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Exploring the impact of oil security attention on oil volatility: A new perspective;International Finance;2024-01-09

2. Text similarity detection based on position of words;AIP Conference Proceedings;2024

3. Information Retrieval and Query Expansion for Biomedical Data;Transactions on Computer Systems and Networks;2024

4. ALGAN: Time Series Anomaly Detection with Adjusted-LSTM GAN;2023-11-16

5. A voice search engine for military symbols to enhance the drafting of operational plan documents on digital map;Journal of Military Science and Technology;2023-05-25