Affiliation:
1. Carnegie Mellon University, Pittsburgh, United States
2. INESC-ID, Lisbon, Portugal
3. Instituto Superior Técnico - Universidade de Lisboa, Lisbon Portugal
4. INESC-ID, Lisbon Portugal
Abstract
To keep up with changes in requirements, frameworks, and coding practices, software organizations might need to migrate code from one language to another. Source-to-source migration, or transpilation, is often a complex, manual process. Transpilation requires expertise both in the source and target language, making it highly laborious and costly. Languages models for code generation and transpilation are becoming increasingly popular. However, despite capturing code-structure well, code generated by language models is often spurious and contains subtle problems. We propose
BatFix
, a novel approach that augments language models for transpilation by leveraging program repair and synthesis to fix the code generated by these models.
BatFix
takes as input both the original program, the target program generated by the machine translation model, and a set of test cases and outputs a repaired program that passes all test cases. Experimental results show that our approach is agnostic to language models and programming languages.
BatFix
can locate bugs spawning multiple lines and synthesize patches for syntax and semantic bugs for programs migrated from
Java
to
C++
and
Python
to
C++
from multiple language models, including, OpenAI’s
Codex
.
Funder
National Science Foundation
Portuguese national funds through FCT
Fundação para a Ciência e a Tecnologia
Publisher
Association for Computing Machinery (ACM)
Reference77 articles.
1. On the Accuracy of Spectrum-based Fault Localization
2. Spectrum-Based Multiple Fault Localization
3. Massively Multilingual Neural Machine Translation
4. Program synthesis with large language models;Austin Jacob;CoRR,2021
5. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), Yoshua Bengio and Yann LeCun (Eds.). Retrieved from DOI:http://arxiv.org/abs/1409.0473