Kangaroo: Theory and Practice of Caching Billions of Tiny Objects on Flash

Author:

McAllister Sara1ORCID,Berg Benjamin1ORCID,Tutuncu-Macias Julian2ORCID,Yang Juncheng1ORCID,Gunasekar Sathya3ORCID,Lu Jimmy3ORCID,Berger Daniel S.4ORCID,Beckmann Nathan1ORCID,Ganger Gregory R.1ORCID

Affiliation:

1. Carnegie Mellon University, Pittsburgh, PA

2. Goldman Sachs, New York City, , NY

3. Meta, Menlo Park

4. Microsoft Research and University of Washington, Redmond, Washington

Abstract

Many social-media and IoT services have very large working sets consisting of billions of tiny (≈100 B) objects. Large, flash-based caches are important to serving these working sets at acceptable monetary cost. However, caching tiny objects on flash is challenging for two reasons: (i)  SSDs can read/write data only in multi-KB “pages” that are much larger than a single object, stressing the limited number of times flash can be written; and (ii)  very few bits per cached object can be kept in DRAM without losing flash’s cost advantage. Unfortunately, existing flash-cache designs fall short of addressing these challenges: write-optimized designs require too much DRAM, and DRAM-optimized designs require too many flash writes. We present Kangaroo , a new flash-cache design that optimizes both DRAM usage and flash writes to maximize cache performance while minimizing cost. Kangaroo combines a large, set-associative cache with a small, log-structured cache. The set-associative cache requires minimal DRAM, while the log-structured cache minimizes Kangaroo’s flash writes. Experiments using traces from Meta and Twitter show that Kangaroo achieves DRAM usage close to the best prior DRAM-optimized design, flash writes close to the best prior write-optimized design, and miss ratios better than both. Kangaroo’s design is Pareto-optimal across a range of allowed write rates, DRAM sizes, and flash sizes, reducing misses by 29% over the state of the art. These results are corroborated by analytical models presented herein and with a test deployment of Kangaroo in a production flash cache at Meta.

Funder

NDSEG Fellowship

Google Research Scholar Award

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture

Reference75 articles.

1. https://aws.amazon.com/dynamodb/features/. 5/5/21 Amazon DynamoDB

2. https://trafficserver.apache.org Apache Traffic Server

3. https://azure.microsoft.com/en-us/services/cache/#what-you-can-build. 5/5/21 Azure Cache for Redis

4. https://saasscout.com/statistics/big-data-statistics/. 5/6/21 Big Data Statistics Growth and Facts 2020

5. https://github.com/twitter/fatcache Fatcache

Cited by 2 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. FIFO queues are all you need for cache eviction;Proceedings of the 29th Symposium on Operating Systems Principles;2023-10-23

2. Darwin: Flexible Learning-based CDN Caching;Proceedings of the ACM SIGCOMM 2023 Conference;2023-09

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3