Affiliation:
1. ETH Zurich, Zurich, Switzerland
Abstract
We introduce a novel approach for testing optimizing compilers with code from real-world applications. The main idea is to construct well-formed programs by fusing multiple code snippets from various real-world projects. The key insight is backed by the fact that the large volume of real-world code exercises rich syntactical and semantic language features, which current engineering-intensive approaches like random program generators are hard to fully support.
To construct well-formed programs from real-world code, our approach works by (1) extracting real-world code at the granularity of function, (2) injecting function calls into seed programs, and (3) leveraging dynamic execution information to maintain the semantics and build complex data dependencies between injected functions and the seed program. With this idea, our approach complements the existing generators by boosting their expressiveness via fusing real-world code in a semantics-preserving way.
We implement our idea in a tool, Creal, to test C compilers. In a nine-month testing period, we have reported 132 bugs to GCC and LLVM, two of the most popular and well-tested C compilers.
At the time of writing, 121 of them have been confirmed as unknown bugs, and 101 of them have been fixed. Most of these bugs were miscompilations, and many were recognized as long-latent and critical.
Our evaluation results evidently demonstrate the significant advantage of using real-world code to stress-test compilers. We believe this idea will benefit the general compiler testing direction and will be directly applicable to other compilers.
Publisher
Association for Computing Machinery (ACM)