Enumerating Valid Non-Alpha-Equivalent Programs for Interpreter Testing-Reference-Cited by-同舟云学术

Enumerating Valid Non-Alpha-Equivalent Programs for Interpreter Testing

Published:2024-06-04 Issue:5 Volume:33 Page:1-31
ISSN:1049-331X
Container-title:ACM Transactions on Software Engineering and Methodology
language:en
Short-container-title:ACM Trans. Softw. Eng. Methodol.

Author:

Xia Xinmeng¹^ORCID,Feng Yang¹^ORCID,Shi Qingkai¹^ORCID,Jones James A.²^ORCID,Zhang Xiangyu³^ORCID,Xu Baowen¹^ORCID

Affiliation:

1. State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China

2. Department of Informatics, University of California, Irvine, Irvine, USA

3. Department of Computer Science, Purdue University, West Lafayette, USA

Abstract

Skeletal program enumeration (SPE) can generate a great number of test programs for validating the correctness of compilers or interpreters. The classic SPE generates programs by exhaustively enumerating all possible variable usage patterns into a given syntactic structure. Even though it is capable of producing many test programs, the exhaustive enumeration strategy generates a large number of invalid programs, which may waste plenty of testing time and resources. To address the problem, this article proposes a tree-based SPE technique. Compared to the state-of-the-art, the key merit of the tree-based approach is that it allows us to take the dependency information into consideration when producing test programs and, thus, make it possible to (1) directly generate non-equivalent programs and (2) apply dominance relations to eliminate invalid test programs that have undefined variables. Hence, our approach significantly saves the cost of the naïve SPE approach. We have implemented our approach into an automated testing tool, IFuzzer , and applied it to test eight different implementations of Python interpreters, including CPython, PyPy, IronPython, Jython, RustPython, GPython, Pyston, and Codon. In three months of fuzzing, IFuzzer detected 142 bugs, of which 87 have been confirmed to be previously unknown bugs, of which 34 have been fixed. Compared to the state-of-the-art SPE techniques, IFuzzer takes only 61.0% of the time cost given the same number of testing seeds and improves 5.3% source code function coverage in the same time budget of testing.

Funder

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3647994

Reference62 articles.

1. Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2007. Compilers: Principles, Techniques, and Tools. Pearson Addison Wesley.

2. Coverage-Based Greybox Fuzzing as Markov Chain

3. Deep Reinforcement Fuzzing

4. Compiler test case generation methods: a survey and assessment