A semisupervised learning method to merge search engine results-Reference-Cited by-同舟云学术

A semisupervised learning method to merge search engine results

Published:2003-10 Issue:4 Volume:21 Page:457-491
ISSN:1046-8188
Container-title:ACM Transactions on Information Systems
language:en
Short-container-title:ACM Trans. Inf. Syst.

Author:

Si Luo¹,Callan Jamie¹

Affiliation:

1. Carnegie Mellon University, Pittsburgh, PA

Abstract

The proliferation of searchable text databases on local area networks and the Internet causes the problem of finding information that may be distributed among many disjoint text databases ( distributed information retrieval ). How to merge the results returned by selected databases is an important subproblem of the distributed information retrieval task. Previous research assumed that either resource providers cooperate to provide normalizing statistics or search clients download all retrieved documents and compute normalized scores without cooperation from resource providers.This article presents a semisupervised learning solution to the result merging problem. The key contribution is the observation that information used to create resource descriptions for resource selection can also be used to create a centralized sample database to guide the normalization of document scores returned by different databases. At retrieval time, the query is sent to the selected databases, which return database-specific document scores, and to a centralized sample database , which returns database-independent document scores. Documents that have both a database-specific score and a database-independent score serve as training data for learning to normalize the scores of other documents. An extensive set of experiments demonstrates that this method is more effective than the well-known CORI result-merging algorithm under a variety of conditions.

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Science Applications,General Business, Management and Accounting,Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/944012.944017

Reference32 articles.

1. Models for metasearch

2. Query-based sampling of text databases

3. TREC and TIPSTER experiments with inquery

Cited by 68 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Federated search techniques: an overview of the trends and state of the art;Knowledge and Information Systems;2023-07-10

2. Machine learning methods for results merging in patent retrieval;Data Technologies and Applications;2023-02-27

3. Snippet-based result merging in federated search;Journal of Information Science;2023-01-12

4. End to End Neural Retrieval for Patent Prior Art Search;Lecture Notes in Computer Science;2022

5. Multi-agent-based hybrid peer-to-peer system for distributed information retrieval;Journal of Information Science;2021-05-11