Abstract
Type inference is a feature that is common to a variety of programming languages. While, in the past, it has been prominently present in functional ones (e.g., ML and Haskell), today, many object-oriented/multi-paradigm languages such as C# and C++ offer, to a certain extent, such a feature. Nevertheless, type inference still is an unexplored subject in the realm of C. In particular, it remains open whether it is possible to devise a technique that encompasses the idiosyncrasies of this language. The first difficulty encountered when tackling this problem is that parsing C requires, not only syntactic, but also semantic information. Yet, greater challenges emerge due to C’s intricate type system. In this work, we present a unification-based framework that lets us infer the missing struct, union, enum, and typedef declarations in a program.
As an application of our technique, we investigate the reconstruction of partial programs. Incomplete source code naturally appears in software development: during design and while evolving, testing, and analyzing programs; therefore, understanding it is a valuable asset. With a reconstructed well-typed program, one can: (i) enable static analysis tools in scenarios where components are absent; (ii) improve precision of “zero setup” static analysis tools; (iii) apply stub generators, symbolic executors, and testing tools on code snippets; and (iv) provide engineers with an assortment of compilable benchmarks for performance and correctness validation. We evaluate our technique on code from a variety of C libraries, including GNU’s Coreutils and on snippets from popular projects such as CPython, FreeBSD, and Git.
Publisher
Association for Computing Machinery (ACM)
Reference102 articles.
1. ANSI-Standard. 1989. ANSI X3.159-1989—The C Programming Language. American National Standards Institute (ANSI) Washington D.C. USA. ANSI-Standard. 1989. ANSI X3.159-1989—The C Programming Language. American National Standards Institute (ANSI) Washington D.C. USA.
2. COBAYN
3. Liveness-Driven Random Program Generation
4. Overhauling SC atomics in C11 and OpenCL
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献
1. FQN Inference in Partial Code by Prompt-tuned Language Model of Code;ACM Transactions on Software Engineering and Methodology;2023-12-21
2. Program representations for predictive compilation: State of affairs in the early 20’s;Journal of Computer Languages;2022-12
3. Prompt-tuned Code Language Model as a Neural Knowledge Base for Type Inference in Statically-Typed Partial Code;Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering;2022-10-10
4. BenchPress;Proceedings of the International Conference on Parallel Architectures and Compilation Techniques;2022-10-08
5. Type Inference for C;ACM Transactions on Programming Languages and Systems;2020-12