Affiliation:
1. Columbia Univ., New York, NY
2. Stanford Univ., Stanford, CA
3. INRIA Rocquencourt, Le Chesnay, France
Abstract
The dramatic growth of the Internet has created a new problem for users: location of the relevant sources of documents. This article presents a framework for (and experimentally analyzes a solution to) this problem, which we call the
text-source discovery problem
. Our approach consists of two phases. First, each text source exports its contents to a centralized service. Second, users present queries to the service, which returns an ordered list of promising text sources. This article describes
GlOSS
, Glossary of Servers Server, with two versions:
bGlOSS
, which provides a Boolean query retrieval model, and
vGlOSS
, which provides a vector-space retrieval model. We also present
hGlOSS
, which provides a decentralized version of the system. We extensively describe the methodology for measuring the retrieval effectiveness of these systems and provide experimental evidence, based on actual data, that all three systems are highly effective in determining promising text sources for a given query.
Publisher
Association for Computing Machinery (ACM)
Cited by
112 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A Multi-Dimensional Source Selection Based on Topic Modelling;J INF SCI ENG;2022
2. Dynamic Shard Cutoff Prediction for Selective Search;The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval;2018-06-27
3. Approximate Queries in Peer-to-Peer Systems;Encyclopedia of Database Systems;2018
4. Improving Shard Selection for Selective Search;Information Retrieval Technology;2017
5. Selection of Information Sources Using a Genetic Algorithm;Advances in Intelligent Systems and Computing;2017