Regex matching with counting-set automata

Author:

Turoňová Lenka1,Holík Lukáš1ORCID,Lengál Ondřej1ORCID,Saarikivi Olli2ORCID,Veanes Margus2,Vojnar Tomáš1ORCID

Affiliation:

1. Brno University of Technology, Czechia

2. Microsoft, USA

Abstract

We propose a solution to the problem of efficient matching regular expressions (regexes) with bounded repetition, such as (ab){1,100}, using deterministic automata. For this, we introduce novel counting-set automata (CsAs) , automata with registers that can hold sets of bounded integers and can be manipulated by a limited portfolio of constant-time operations. We present an algorithm that compiles a large sub-class of regexes to deterministic CsAs. This includes (1) a novel Antimirov-style translation of regexes with counting to counting automata (CAs) , nondeterministic automata with bounded counters, and (2) our main technical contribution, a determinization of CAs that outputs CsAs. The main advantage of this workflow is that the size of the produced CsAs does not depend on the repetition bounds used in the regex (while the size of the DFA is exponential to them). Our experimental results confirm that deterministic CsAs produced from practical regexes with repetition are indeed vastly smaller than the corresponding DFAs. More importantly, our prototype matcher based on CsA simulation handles practical regexes with repetition regardless of sizes of counter bounds. It easily copes with regexes with repetition where state-of-the-art matchers struggle.

Publisher

Association for Computing Machinery (ACM)

Subject

Safety, Risk, Reliability and Quality,Software

Reference47 articles.

1. R-Automata

2. A Unified Construction of the Glushkov, Follow, and Antimirov Automata

3. Valentin Antimirov. 1996. Partial derivatives of regular expressions and nite automaton constructions. Theoretical Computer Science 155 2 ( 1996 ) 291-319. https://doi.org/10.1016/ 0304-3975 ( 95 ) 00182-4 10.1016/0304-3975(95)00182-4 Valentin Antimirov. 1996. Partial derivatives of regular expressions and nite automaton constructions. Theoretical Computer Science 155 2 ( 1996 ) 291-319. https://doi.org/10.1016/ 0304-3975 ( 95 ) 00182-4 10.1016/0304-3975(95)00182-4

4. Adam Baldwin. 2016. Regular Expression Denial of Service a ecting Express.js. http://web.archive.org/web/20170116160113/ https://medium.com/node-security/ regular-expression-denial-of-service-a ecting-express-js-9c397c164c43 Adam Baldwin. 2016. Regular Expression Denial of Service a ecting Express.js. http://web.archive.org/web/20170116160113/ https://medium.com/node-security/ regular-expression-denial-of-service-a ecting-express-js-9c397c164c43

5. Sébastien Bardin Alain Finkel Jérôme Leroux and Laure Petrucci. 2008. FAST: acceleration from theory to practice. STTT 10 5 ( 2008 ) 401-424. https://doi.org/10.1007/s10009-008-0064-3 10.1007/s10009-008-0064-3 Sébastien Bardin Alain Finkel Jérôme Leroux and Laure Petrucci. 2008. FAST: acceleration from theory to practice. STTT 10 5 ( 2008 ) 401-424. https://doi.org/10.1007/s10009-008-0064-3 10.1007/s10009-008-0064-3

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. String Constraints with Regex-Counting and String-Length Solved More Efficiently;Dependable Software Engineering. Theories, Tools, and Applications;2023-12-15

2. Derivative Based Nonbacktracking Real-World Regex Matching with Backtracking Semantics;Proceedings of the ACM on Programming Languages;2023-06-06

3. Regular Expression Matching using Bit Vector Automata;Proceedings of the ACM on Programming Languages;2023-04-06

4. Effective ReDoS Detection by Principled Vulnerability Modeling and Exploit Generation;P IEEE S SECUR PRIV;2023

5. Fast Matching of Regular Patterns with Synchronizing Counting;Lecture Notes in Computer Science;2023

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3