Abstract
PurposeIn federated search, a query is sent simultaneously to multiple resources and each one of them returns a list of results. These lists are merged into a single list using the results merging process. In this work, the authors apply machine learning methods for results merging in federated patent search. Even though several methods for results merging have been developed, none of them were tested on patent data nor considered several machine learning models. Thus, the authors experiment with state-of-the-art methods using patent data and they propose two new methods for results merging that use machine learning models.Design/methodology/approachThe methods are based on a centralized index containing samples of documents from all the remote resources, and they implement machine learning models to estimate comparable scores for the documents retrieved by different resources. The authors examine the new methods in cooperative and uncooperative settings where document scores from the remote search engines are available and not, respectively. In uncooperative environments, they propose two methods for assigning document scores.FindingsThe effectiveness of the new results merging methods was measured against state-of-the-art models and found to be superior to them in many cases with significant improvements. The random forest model achieves the best results in comparison to all other models and presents new insights for the results merging problem.Originality/valueIn this article the authors prove that machine learning models can substitute other standard methods and models that used for results merging for many years. Our methods outperformed state-of-the-art estimation methods for results merging, and they proved that they are more effective for federated patent search.
Subject
Library and Information Sciences,Information Systems
Reference34 articles.
1. The FedLemur project: federated search in the real world;Journal of the American Society for Information Science and Technology,2006
2. Callan, J. (2002), “Distributed information retrieval”, in Croft, W.B. (Ed.), Advances in Information Retrieval, The Information Retrieval Series, Vol. 7, Springer, Boston, MA, pp. 127-150. doi: 10.1007/0-306-47019-5_5
3. Query-based sampling of text databases;ACM Transactions on Information Systems,2001
4. Searching distributed collections with inference networks,1995
5. Clarke, N.S. (2018), “The basics of patent searching”, World Patent Information, Vol. 54, pp. S4-S10.
Cited by
2 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献