Affiliation:
1. University of Copenhagen, Denmark
2. Jobindex, Denmark
Abstract
We present and illustrate Kleenex, a language for expressing general nondeterministic finite transducers, and its novel compilation to streaming string transducers with essentially optimal streaming behavior, worst-case linear-time performance and sustained high throughput. Its underlying theory is based on transducer decomposition into oracle and action machines: the oracle machine performs streaming greedy disambiguation of the input; the action machine performs the output actions. In use cases Kleenex achieves consistently high throughput rates around the 1 Gbps range on stock hardware. It performs well, especially in complex use cases, in comparison to both specialized and related tools such as GNUawk, GNUsed, GNUgrep, RE2, Ragel and regular-expression libraries.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Graphics and Computer-Aided Design,Software
Reference77 articles.
1. ISBN 0-444-88071-2 and 0-262-22038-5. ISBN 0-444-88071-2 and 0-262-22038-5.
2. Complexity of Regular Functions
Cited by
6 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. Efficient Matching of Regular Expressions with Lookaround Assertions;Proceedings of the ACM on Programming Languages;2024-01-05
2. A GPU-accelerated Data Transformation Framework Rooted in Pushdown Transducers;2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC);2022-12
3. Typed parsing and unparsing for untyped regular expression engines;Proceedings of the 2019 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation - PEPM 2019;2019
4. StreamQRE: modular specification and efficient evaluation of quantitative queries over streaming data;Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation;2017-06-14
5. Fusing effectful comprehensions;Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation;2017-06-14