Affiliation:
1. Turgut Ozal University
2. Middle East Technical University
3. Yahoo Labs, Barcelona
4. Bilkent University
Abstract
Web search engines are known to cache the results of previously issued queries. The stored results typically contain the document summaries and some data that is used to construct the final search result page returned to the user. An alternative strategy is to store in the cache only the result document IDs, which take much less space, allowing results of more queries to be cached. These two strategies lead to an interesting trade-off between the hit rate and the average query response latency. In this work, in order to exploit this trade-off, we propose a hybrid result caching strategy where a dynamic result cache is split into two sections: an HTML cache and a docID cache. Moreover, using a realistic cost model, we evaluate the performance of different result prefetching strategies for the proposed hybrid cache and the baseline HTML-only cache. Finally, we propose a machine learning approach to predict singleton queries, which occur only once in the query stream. We show that when the proposed hybrid result caching strategy is coupled with the singleton query predictor, the hit rate is further improved.
Funder
Türkiye Bilimsel ve Teknolojik Arastirma Kurumu
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications
Cited by
11 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Three-level Compact Caching for Search Engines Based on Solid State Drives;2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys);2021-12
2. Topical result caching in web search engines;Information Processing & Management;2020-05
3. Caching Scores for Faster Query Processing with Dynamic Pruning in Search Engines;Proceedings of the 28th ACM International Conference on Information and Knowledge Management;2019-11-03
4. C3C: A New Static Content-Based Three-Level Web Cache;IEEE Access;2019
5. On the Impact of Storing Query Frequency History for Search Engine Result Caching;Lecture Notes in Computer Science;2019