Machine learning methods for results merging in patent retrieval-Reference-Cited by-同舟云学术

Machine learning methods for results merging in patent retrieval

Published:2023-02-27 Issue: Volume: Page:
ISSN:2514-9288
Container-title:Data Technologies and Applications
language:en
Short-container-title:DTA

Author:

Stamatis Vasileios^ORCID,Salampasis Michail^ORCID,Diamantaras Konstantinos

Abstract

PurposeIn federated search, a query is sent simultaneously to multiple resources and each one of them returns a list of results. These lists are merged into a single list using the results merging process. In this work, the authors apply machine learning methods for results merging in federated patent search. Even though several methods for results merging have been developed, none of them were tested on patent data nor considered several machine learning models. Thus, the authors experiment with state-of-the-art methods using patent data and they propose two new methods for results merging that use machine learning models.Design/methodology/approachThe methods are based on a centralized index containing samples of documents from all the remote resources, and they implement machine learning models to estimate comparable scores for the documents retrieved by different resources. The authors examine the new methods in cooperative and uncooperative settings where document scores from the remote search engines are available and not, respectively. In uncooperative environments, they propose two methods for assigning document scores.FindingsThe effectiveness of the new results merging methods was measured against state-of-the-art models and found to be superior to them in many cases with significant improvements. The random forest model achieves the best results in comparison to all other models and presents new insights for the results merging problem.Originality/valueIn this article the authors prove that machine learning models can substitute other standard methods and models that used for results merging for many years. Our methods outperformed state-of-the-art estimation methods for results merging, and they proved that they are more effective for federated patent search.

Publisher

Emerald

Subject

Library and Information Sciences,Information Systems

Reference34 articles.

1. The FedLemur project: federated search in the real world;Journal of the American Society for Information Science and Technology,2006

2. Callan, J. (2002), “Distributed information retrieval”, in Croft, W.B. (Ed.), Advances in Information Retrieval, The Information Retrieval Series, Vol. 7, Springer, Boston, MA, pp. 127-150. doi: 10.1007/0-306-47019-5_5

3. Query-based sampling of text databases;ACM Transactions on Information Systems,2001

4. Searching distributed collections with inference networks,1995

5. Clarke, N.S. (2018), “The basics of patent searching”, World Patent Information, Vol. 54, pp. S4-S10.

Cited by 2 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Utilizing Ant Colony Optimization for Result Merging in Federated Search;Engineering, Technology & Applied Science Research;2024-08-02

2. The Analysis of Images of Mathematical and Chemical Formulas from Patent Documents;2024 International Russian Smart Industry Conference (SmartIndustryCon);2024-03-25