Automatic discovery of language models for text databases-Reference-Cited by-同舟云学术

Automatic discovery of language models for text databases

Published:1999-06 Issue:2 Volume:28 Page:479-490
ISSN:0163-5808
Container-title:ACM SIGMOD Record
language:en
Short-container-title:SIGMOD Rec.

Author:

Callan Jamie¹,Connell Margaret¹,Du Aiqun¹

Affiliation:

1. Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts, Amherst, Massachusetts

Abstract

The proliferation of text databases within large organizations and on the Internet makes it difficult for a person to know which databases to search. Given language models that describe the contents of each database, a database selection algorithm such as GIOSS can provide assistance by automatically selecting appropriate databases for an information need. Current practice is that each database provides its language model upon request, but this cooperative approach has important limitations. This paper demonstrates that cooperation is not required. Instead, the database selection service can construct its own language models by sampling database contents via the normal process of running queries and retrieving documents. Although random sampling is not possible, it can be approximated with carefully selected queries. This sampling approach avoids the limitations that characterize the cooperative approach, and also enables additional capabilities. Experimental results demonstrate that accurate language models can be learned from a relatively small number of queries and documents.

Publisher

Association for Computing Machinery (ACM)

Subject

Information Systems,Software

Link

https://dl.acm.org/doi/pdf/10.1145/304181.304224

Reference16 articles.

1. Searching distributed collections with inference networks

2. Distributed indexing

3. Evaluating database selection techniques

4. STARTS

Cited by 37 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Profiling Web Archival Voids for Memento Routing;2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL);2021-09

2. Big Data Processing;Principles of Distributed Database Systems;2019-12-03

3. Automatic Evaluation of Video Search engines in Persian Web domain based on Majority Voting;Signal and Data Processing;2018-12-01

4. Exploiting Social Annotations to Generate Resource Descriptions in a Distributed Environment: Cooperative Multi-Agent Simulation on Query-Based Sampling;The Review of Socionetwork Strategies;2017-06

5. Efficient distributed selective search;Information Retrieval Journal;2016-11-25