Embedding based learning for collection selection in federated search-Reference-Cited by-同舟云学术

Embedding based learning for collection selection in federated search

Published:2020-10-28 Issue:5 Volume:54 Page:703-717
ISSN:2514-9288
Container-title:Data Technologies and Applications
language:en
Short-container-title:DTA

Author:

Garba Adamu,Khalid Shah^ORCID,Ullah Irfan^ORCID,Khusro Shah^ORCID,Mumin Diyawu^ORCID

Abstract

PurposeThere have been many challenges in crawling deep web by search engines due to their proprietary nature or dynamic content. Distributed Information Retrieval (DIR) tries to solve these problems by providing a unified searchable interface to these databases. Since a DIR must search across many databases, selecting a specific database to search against the user query is challenging. The challenge can be solved if the past queries of the users are considered in selecting collections to search in combination with word embedding techniques. Combining these would aid the best performing collection selection method to speed up retrieval performance of DIR solutions.Design/methodology/approachThe authors propose a collection selection model based on word embedding using Word2Vec approach that learns the similarity between the current and past queries. They used the cosine and transformed cosine similarity models in computing the similarities among queries. The experiment is conducted using three standard TREC testbeds created for federated search.FindingsThe results show significant improvements over the baseline models.Originality/valueAlthough the lexical matching models for collection selection using similarity based on past queries exist, to the best our knowledge, the proposed work is the first of its kind that uses word embedding for collection selection by learning from past queries.

Publisher

Emerald

Subject

Library and Information Sciences,Information Systems

Reference33 articles.

1. A comparison of deep learning based query expansion with pseudo-relevance feedback and mutual information,2016

2. Classification-based resource selection,2009

3. The FedLemur: federated search in the real world;Journal of the American Society for Information Science and Technology,2006

4. Knowledge based collection selection for distributed information retrieval;Information Processing and Management,2018

5. Distributed information retrieval,2000

Cited by 4 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Utilizing Ant Colony Optimization for Result Merging in Federated Search;Engineering, Technology & Applied Science Research;2024-08-02

2. Federated search techniques: an overview of the trends and state of the art;Knowledge and Information Systems;2023-07-10

3. Understanding the impact of query expansion on federated search;Multimedia Tools and Applications;2023-06-21

4. Snippet-based result merging in federated search;Journal of Information Science;2023-01-12