Asynchronous Automata Processing on GPUs-Reference-Cited by-同舟云学术

Asynchronous Automata Processing on GPUs

Published:2023-02-27 Issue:1 Volume:7 Page:1-27
ISSN:2476-1249
Container-title:Proceedings of the ACM on Measurement and Analysis of Computing Systems
language:en
Short-container-title:Proc. ACM Meas. Anal. Comput. Syst.

Author:

Liu Hongyuan¹^ORCID,Pai Sreepathi²^ORCID,Jog Adwait³^ORCID

Affiliation:

1. William & Mary / The Hong Kong University of Science and Technology (Guangzhou), Williamsburg, VA, USA

2. University of Rochester, Rochester, NY, USA

3. William & Mary / University of Virginia, Williamsburg, VA, USA

Abstract

Finite-state automata serve as compute kernels for many application domains such as pattern matching and data analytics. Existing approaches on GPUs exploit three levels of parallelism in automata processing tasks: 1)~input stream level, 2)~automaton-level and 3)~state-level. Among these, only state-level parallelism is intrinsic to automata while the other two levels of parallelism depend on the number of automata and input streams to be processed. As GPU resources increase, a parallelism-limited automata processing task can underutilize GPU compute resources. To this end, we propose AsyncAP, a low-overhead approach that optimizes for both scalability and throughput. Our insight is that most automata processing tasks have an additional source of parallelism originating from the input symbols which has not been leveraged before. Making the matching process associated with the automata tasks asynchronous, i.e., parallel GPU threads start processing an input stream from different input locations instead of processing it serially, improves throughput significantly and scales with input length. When the task does not have enough parallelism to utilize all the GPU cores, detailed evaluation across 12 evaluated applications shows that AsyncAP achieves up to 58× speedup on average over the state-of-the-art GPU automata processing engine. When the tasks have enough parallelism to utilize GPU cores, AsyncAP still achieves 2.4× speedup.

Funder

National Science Foundation

Publisher

Association for Computing Machinery (ACM)

Subject

Computer Networks and Communications,Hardware and Architecture,Safety, Risk, Reliability and Quality,Computer Science (miscellaneous)

Link

https://dl.acm.org/doi/pdf/10.1145/3579453

Reference67 articles.

1. M. Karim Abdalla et al. 2013 . Scheduling and Execution of Compute Tasks. US 2013/0185728A1. M. Karim Abdalla et al. 2013. Scheduling and Execution of Compute Tasks. US 2013/0185728A1.

2. 018)]% mnrl K. Angstadt J. Wadden V. Dang T. Xie D. Kramp W. Weimer M. Stan and K. Skadron. 2018. MNCaRT: An Open-Source Multi-Architecture Automata-Processing Research and Execution Ecosystem. IEEE Computer Architecture Letters (CAL) (2018). 018)]% mnrl K. Angstadt J. Wadden V. Dang T. Xie D. Kramp W. Weimer M. Stan and K. Skadron. 2018. MNCaRT: An Open-Source Multi-Architecture Automata-Processing Research and Execution Ecosystem. IEEE Computer Architecture Letters (CAL) (2018).

3. Scalable Algorithms for NFA Multi-Striding and NFA-Based Deep Packet Inspection on GPUs

4. 009)]% gpgpu-sim A. Bakhoda G.L. Yuan W.W.L. Fung H. Wong and T.M. Aamodt. 2009. Analyzing CUDA Workloads Using a Detailed GPU Simulator. In ISPASS. 009)]% gpgpu-sim A. Bakhoda G.L. Yuan W.W.L. Fung H. Wong and T.M. Aamodt. 2009. Analyzing CUDA Workloads Using a Detailed GPU Simulator. In ISPASS.

5. An improved algorithm to accelerate regular expression evaluation

Cited by 1 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Regular Expressions on Modern GPGPUs;16th Workshop on General Purpose Processing Using GPU;2024-03-02