Stochastic Query Covering for Fast Approximate Document Retrieval-Reference-Cited by-同舟云学术

Stochastic Query Covering for Fast Approximate Document Retrieval

Published:2015-03-23 Issue:3 Volume:33 Page:1-35
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Anagnostopoulos Aris¹,Becchetti Luca¹,Bordino Ilaria²,Leonardi Stefano¹,Mele Ida¹,Sankowski Piotr³

Affiliation:

1. Sapienza, University of Rome, Italy

2. Yahoo Labs, Barcelona, Spain

3. University of Warsaw, Poland

Abstract

We design algorithms that, given a collection of documents and a distribution over user queries, return a small subset of the document collection in such a way that we can efficiently provide high-quality answers to user queries using only the selected subset. This approach has applications when space is a constraint or when the query-processing time increases significantly with the size of the collection. We study our algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction of the entire collection, they can provide answers to most user queries, achieving a performance close to the optimal. To complement our theoretical findings, we experimentally show the versatility of our approach by considering two important cases in the context of Web search. In the first case, we favor the retrieval of documents that are relevant to the query, whereas in the second case we aim for document diversification. Both the theoretical and the experimental analysis provide strong evidence of the potential value of query covering in diverse application scenarios.

Funder

EU FET projects MULTIPLEX 317532 and SIMPOL 610704

EU FP7 project 255403-SNAPS

PRIN 2008 research projects COGENT (COmputational and GamE-theoretic aspects of uncoordinated NeTworks) and Mad Web (Models, Algorithms and Data structures for the Web and other behavioral networks)

EU ERC StG project PAAl 259515

Italian Ministry of University and Research

Google Focused Award “Algorithms for Large-Scale Data Analysis”

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/2699671

Reference52 articles.

1. Join synopses for approximate query answering

2. Diversifying search results

3. The online set cover problem

4. Stochastic query covering

5. Graphs from Search Engine Queries

Cited by 7 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An Efficient and Robust Semantic Hashing Framework for Similar Text Search;ACM Transactions on Information Systems;2023-01-30

2. A master-apprentice evolutionary algorithm for maximum weighted set K-covering problem;Applied Intelligence;2022-05-04

3. Topical result caching in web search engines;Information Processing & Management;2020-05

4. Better Streaming Algorithms for the Maximum Coverage Problem;Theory of Computing Systems;2018-07-23

5. A Dynamic and Context-Aware Social Network Approach for Multiple Criteria Decision Making Through a Graph-Based Knowledge Learning;Advances in Wireless Technologies and Telecommunication;2018