T-Rec: Fine-Grained Language-Agnostic Program Reduction Guided by Lexical Syntax-Reference-Cited by-同舟云学术

T-Rec: Fine-Grained Language-Agnostic Program Reduction Guided by Lexical Syntax

Published:2024-08-30 Issue: Volume: Page:
ISSN:1049-331X
Container-title:ACM Transactions on Software Engineering and Methodology
language:en
Short-container-title:ACM Trans. Softw. Eng. Methodol.

Author:

Xu Zhenyang¹^ORCID,Tian Yongqiang²^ORCID,Zhang Mengxiao¹^ORCID,Zhang Jiarui¹^ORCID,Liu Puzhuo³^ORCID,Jiang Yu⁴^ORCID,Sun Chengnian¹^ORCID

Affiliation:

1. University of Waterloo, Canada

2. The Hong Kong University of Science and Technology, China

3. Ant Group, China

4. Tsinghua University, China

Abstract

Program reduction strives to eliminate bug-irrelevant code elements from a bug-triggering program, so that (1) a smaller and more straightforward bug-triggering program can be obtained, (2) and the difference among duplicates ( i.e. , different programs that trigger the same bug) can be minimized or even eliminated. With such reduction and canonicalization functionality, program reduction facilitates debugging for software, especially language toolchains, such as compilers, interpreters, and debuggers. While many program reduction techniques have been proposed, most of them (especially the language-agnostic ones) overlooked the potential reduction opportunities hidden within tokens. Therefore, their capabilities in terms of reduction and canonicalization are significantly restricted. To fill this gap, we propose T-Rec, a fine-grained language-agnostic program reduction technique guided by lexical syntax. Instead of treating tokens as atomic and irreducible components, T-Rec introduces a fine-grained reduction process that leverages the lexical syntax of programming languages to effectively explore the reduction opportunities in tokens. Through comprehensive evaluations with versatile benchmark suites, we demonstrate that T-Rec significantly improves the reduction and canonicalization capability of two existing language-agnostic program reducers ( i.e. , Perses and Vulcan). T-Rec enables Perses and Vulcan to further eliminate 1,294 and 1,315 duplicates in a benchmark suite that contains 3,796 test cases that triggers 46 unique bugs. Additionally, T-Rec can also reduce up to 65.52% and 53.73% bytes in the results of Perses and Vulcan on our multi-lingual benchmark suite, respectively.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3690631

Reference47 articles.

1. ANTLR. 2017. The ANTLR Parser Generator. Retrieved 2022-09-20 from https://www.antlr.org/

2. Cornelius Aschermann, Tommaso Frassetto, Thorsten Holz, Patrick Jauernig, Ahmad-Reza Sadeghi, and Daniel Teuchert. 2019. NAUTILUS: Fishing for Deep Bugs with Grammars. In 26th Annual Network and Distributed System Security Symposium, NDSS 2019, San Diego, California, USA, February 24-27, 2019. The Internet Society. https://www.ndss-symposium.org/ndss-paper/nautilus-fishing-for-deep-bugs-with-grammars/

3. Yang Chen, Alex Groce, Chaoqiang Zhang, Weng-Keen Wong, Xiaoli Fern, Eric Eide, and John Regehr. 2013. Taming Compiler Fuzzers. In Proceedings of the 2013 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 197–208.

4. Nathan Chong, Alastair Donaldson, Andrei Lascu, and Christopher Lidbury. 2015. Many-Core Compiler Fuzzing. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI).

5. CPython. 2022. Bug Report. Retrieved 2022-09-20 from https://github.com/python/cpython/issues/new?assignees=&labels=type-bug&template=bug.md