Affiliation:
1. Yonsei University, Seoul, Korea
2. POSTECH
3. Microsoft Research
Abstract
A commercial web search engine shards its index among many servers, and therefore the response time of a search query is dominated by the slowest server that processes the query. Prior approaches target improving responsiveness by reducing the
tail latency
, or high-percentile response time, of an individual search server. They predict query execution time, and if a query is predicted to be long-running, it runs in parallel; otherwise, it runs sequentially. These approaches are, however, not accurate enough for reducing a high tail latency when responses are aggregated from many servers because this requires each server to reduce a substantially higher tail latency (e.g., the 99.99th percentile), which we call extreme tail latency.
To address tighter requirements of
extreme
tail latency, we propose a new
design space
for the problem, subsuming existing work and also proposing a new solution space. Existing work makes a prediction using features available at indexing time and focuses on optimizing prediction features for accelerating tail queries. In contrast, we identify “when to predict?” as another key optimization question. This opens up a new solution of delaying a prediction by a short duration to allow many short-running queries to complete without parallelization and, at the same time, to allow the predictor to collect a set of dynamic features using runtime information. This new question expands a solution space in two meaningful ways. First, we see a significant reduction of tail latency by leveraging “dynamic” features collected at runtime that estimate query execution time with higher accuracy. Second, we can ask whether to override prediction when the “predictability” is low. We show that considering predictability accelerates the query by achieving a higher recall.
With this prediction, we propose to accelerate the queries that are predicted to be long-running. In our preliminary work, we focused on parallelization as an acceleration scenario. We extend to consider heterogeneous multicore hardware for acceleration. This hardware combines processor cores with different microarchitectures such as energy-efficient little cores and high-performance big cores, and accelerating web search using this hardware has remained an open problem.
We evaluate the proposed prediction framework in two scenarios: (1) query parallelization on a multicore processor and (2) query scheduling on a heterogeneous processor. Our extensive evaluation results show that, for both scenarios of query acceleration using parallelization and heterogeneous cores, the proposed framework is effective in reducing the extreme tail latency compared to a start-of-the-art predictor because of its higher recall, and it improves server throughput by more than 70% because of its improved precision.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Networks and Communications
Reference42 articles.
1. Design trade-offs for search engine caching
2. R. Baeza-Yates V. Murdock and C. Hauff. 2009. Efficiency trade-offs in two-tier web search systems. In SIGIR. 10.1145/1571941.1571971 R. Baeza-Yates V. Murdock and C. Hauff. 2009. Efficiency trade-offs in two-tier web search systems. In SIGIR. 10.1145/1571941.1571971
3. M. Becchi and P. Crowley. 2006. Dynamic thread assignment on heterogeneous multiprocessor architectures. ACM Computing Frontiers (2006). 10.1145/1128022.1128029 M. Becchi and P. Crowley. 2006. Dynamic thread assignment on heterogeneous multiprocessor architectures. ACM Computing Frontiers (2006). 10.1145/1128022.1128029
4. C. Bienia S. Kumar J. P. Singh and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. Technical Report (2008). C. Bienia S. Kumar J. P. Singh and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. Technical Report (2008).
5. Z. Bosnic and I. Kononenko. 2008. Comparison of approaches for estimating reliability of individual regression predictions. Data Knowledge Engineering (2008). 10.1016/j.datak.2008.08.001 Z. Bosnic and I. Kononenko. 2008. Comparison of approaches for estimating reliability of individual regression predictions. Data Knowledge Engineering (2008). 10.1016/j.datak.2008.08.001
Cited by
10 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献