Abstract
One of the central claims of fame of the Coq proof assistant is extraction, i.e. the ability to obtain efficient programs in industrial programming languages such as OCaml, Haskell, or Scheme from programs written in Coq’s expressive dependent type theory. Extraction is of great practical usefulness, used crucially e.g. in the CompCert project. However, for such executables obtained by extraction, the extraction process is part of the trusted code base (TCB), as are Coq’s kernel and the compiler used to compile the extracted code. The extraction process contains intricate semantic transformation of programs that rely on subtle operational features of both the source and target language. Its code has also evolved since the last theoretical exposition in the seminal PhD thesis of Pierre Letouzey. Furthermore, while the exact correctness statements for the execution of extracted code are described clearly in academic literature, the interoperability with unverified code has never been investigated formally, and yet is used in virtually every project relying on extraction. In this paper, we describe the development of a novel extraction pipeline from Coq to OCaml, implemented and verified in Coq itself, with a clear correctness theorem and guarantees for safe interoperability. We build our work on the MetaCoq project, which aims at decreasing the TCB of Coq’s kernel by re-implementing it in Coq itself and proving it correct w.r.t. a formal specification of Coq’s type theory in Coq. Since OCaml does not have a formal specification, we make use of the project specifying the semantics of the intermediate language of the OCaml compiler. Our work fills some gaps in the literature and highlights important differences between the operational semantics of Coq programs and their extraction. In particular, we focus on the guarantees that can be provided for interoperability with unverified code, and prove that extracted programs of first-order data type are correct and can safely interoperate, whereas for higher-order programs already simple interoperations can lead to incorrect behaviour and even outright segfaults.
Funder
HORIZON EUROPE Marie Sklodowska-Curie Actions
Publisher
Association for Computing Machinery (ACM)
Reference38 articles.
1. Proof-Producing Synthesis of CakeML from Monadic HOL Functions
2. Abhishek Anand, Andrew Appel, Greg Morrisett, Zoe Paraskevopoulou, Randy Pollack, Olivier Savary Belanger, Matthieu Sozeau, and Matthew Weaver. 2017. CertiCoq: A verified compiler for Coq. In CoqPL. Paris, France. http://conf.researchr.org/event/CoqPL-2017/main-certicoq-a-verified-compiler-for-coq
3. Danil Annenkov, Mikkel Milo, and Bas Spitters. 2021. Code Extraction from Coq to ML-like languages. In ML Family Workshop 2021. https://github.com/AU-COBRA/ConCert/blob/master/papers/ML-family.pdf
4. ConCert: a smart contract certification framework in Coq