Abstract
We develop a new derivative based theory and algorithm for nonbacktracking regex matching that supports anchors and counting, preserves backtracking semantics, and can be extended with lookarounds. The algorithm has been implemented as a new regex backend in .NET and was extensively tested as part of the formal release process of .NET7. We present a formal proof of the correctness of the algorithm, which we believe to be the first of its kind concerning industrial implementations of regex matchers. The paper describes the complete foundation, the matching algorithm, and key aspects of the implementation involving a regex rewrite system, as well as a comprehensive evaluation over industrial case studies and other regex engines.
Publisher
Association for Computing Machinery (ACM)
Subject
Safety, Risk, Reliability and Quality,Software
Reference48 articles.
1. Partial derivatives of regular expressions and finite automata constructions
2. POSIX Lexing with Derivatives of Regular Expressions (Proof Pearl)
3. Adam Baldwin. 2016. Regular Expression Denial of Service affecting Express.js. http://web.archive.org/web/20170116160113/https://medium.com/node-security/regular-expression-denial-of-service-affecting-express-js-9c397c164c43 Adam Baldwin. 2016. Regular Expression Denial of Service affecting Express.js. http://web.archive.org/web/20170116160113/https://medium.com/node-security/regular-expression-denial-of-service-affecting-express-js-9c397c164c43
4. Formalising and implementing Boost POSIX regular expression matching
5. An efficient representation for sparse sets
Cited by
9 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. A Coq Mechanization of JavaScript Regular Expression Semantics;Proceedings of the ACM on Programming Languages;2024-08-15
2. Linear Matching of JavaScript Regular Expressions;Proceedings of the ACM on Programming Languages;2024-06-20
3. One Automaton to Rule Them All: Beyond Multiple Regular Expressions Execution;2024 IEEE/ACM International Symposium on Code Generation and Optimization (CGO);2024-03-02
4. Lean Formalization of Extended Regular Expression Matching with Lookarounds;Proceedings of the 13th ACM SIGPLAN International Conference on Certified Programs and Proofs;2024-01-09
5. Efficient Matching of Regular Expressions with Lookaround Assertions;Proceedings of the ACM on Programming Languages;2024-01-05