Small subset queries and bloom filters using ternary associative memories, with applications

Author:

Goel Ashish1,Gupta Pankaj2

Affiliation:

1. Stanford University, Stanford, CA, USA

2. Twitter, Inc., San Francisco, CA, USA

Abstract

Associative memories offer high levels of parallelism in matching a query against stored entries. We design and analyze an architecture which uses single lookup into a Ternary Content Addressable Memory (TCAM) to solve the subset query problem for small sets, i.e., to check whether a given set (the query) contains (or alternately, is contained in) any one of a large collection of sets in a database. We use each TCAM entry as a small Ternary Bloom Filter (each 'bit' of which is one of {0,1,wildcard}) to store one of the sets in the collection. Like Bloom filters, our architecture is susceptible to false positives. Since each TCAM entry is quite small, asymptotic analyses of Bloom filters do not directly apply. Surprisingly, we are able to show that the asymptotic false positive probability formula can be safely used if we penalize the small Bloom filter by taking away just one bit of storage and adding just half an extra set element before applying the formula. We believe that this analysis is independently interesting. The subset query problem has applications in databases, network intrusion detection, packet classification in Internet routers, and Information Retrieval. We demonstrate our architecture on one illustrative streaming application -- intrusion detection in network traffic. Be shingling (i.e., taking consecutive bytes of) the strings in the database, we can perform a single subset query and hence a single TCAM search, to skip many bytes in the stream. We evaluate our scheme on the open source CLAM anti-virus database, for worst-case as well as random streams. Our architecture appears to be at least one order of magnitude faster than previous approaches. Since the individual Bloom filters must fit in a single TCAM entry (currently 72 to 576 bits), our solution applies only when each set is of a small cardinality. However, this is sufficient for many typical applications. Also, recent algorithms for the subset-query problem use a small-set version as a subroutine

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Software

Reference41 articles.

1. Fast data stream algorithms using associative memories

2. Blog entry about query sizes. http://www.beussery.com/blog/index.php/2008/02/google-average-number-of-word%s-per-query-have-increased/. Blog entry about query sizes. http://www.beussery.com/blog/index.php/2008/02/google-average-number-of-word%s-per-query-have-increased/.

3. Space/time trade-offs in hash coding with allowable errors

4. A fast string searching algorithm

Cited by 28 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Space-efficient and high-performance inline deduplication for emerging hybrid storage system with Libra+;Journal of Systems Architecture;2024-05

2. Optimizing 0-RTT Key Exchange with Full Forward Security;Proceedings of the 2023 on Cloud Computing Security Workshop;2023-11-26

3. Scalably Detecting Third-Party Android Libraries With Two-Stage Bloom Filtering;IEEE Transactions on Software Engineering;2023-04-01

4. Lightweight certificate revocation for low-power IoT with end-to-end security;Journal of Information Security and Applications;2023-03

5. Tree sketch: An accurate and memory-efficient sketch for network-wide measurement;Computer Communications;2022-10

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3