Affiliation:
1. Carnegie Mellon Univ.
2. Univ., of Massachusetts
Abstract
The proliferation of searchable text databases on corporate networks and the Internet causes a database selection problem for many people. Algorithms such as gGLOSS and CORI can automatically select which text databases to search for a given information need, but only if given a set of resource descriptions that accurately represent the contents of each database. The existing techniques for a acquiring resource descriptions have significant limitations when used in wide-area networks controlled by many parties. This paper presents
query-based sampling
, a new technicque for acquiring accurate resource descriptions. Query-based sampling does not require the cooperation of resource providers, nor does it require that resource providers use a particular search engine or representation technique. An extensive set of experimental results demonstrates that accurate resource descriptions are crated, that computation and communication costs are reasonable, and that the resource descriptions do in fact enable accurate automatic dtabase selection.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Cited by
175 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献