BatFix: Repairing language model-based transpilation-Reference-Cited by-同舟云学术

BatFix: Repairing language model-based transpilation

Published:2024-06-27 Issue:6 Volume:33 Page:1-29
ISSN:1049-331X
Container-title:ACM Transactions on Software Engineering and Methodology
language:en
Short-container-title:ACM Trans. Softw. Eng. Methodol.

Author:

Ramos Daniel¹²³^ORCID,Lynce Inês⁴³^ORCID,Manquinho Vasco⁴³^ORCID,Martins Ruben¹^ORCID,Le Goues Claire¹^ORCID

Affiliation:

1. Carnegie Mellon University, Pittsburgh, United States

2. INESC-ID, Lisbon, Portugal

3. Instituto Superior Técnico - Universidade de Lisboa, Lisbon Portugal

4. INESC-ID, Lisbon Portugal

Abstract

To keep up with changes in requirements, frameworks, and coding practices, software organizations might need to migrate code from one language to another. Source-to-source migration, or transpilation, is often a complex, manual process. Transpilation requires expertise both in the source and target language, making it highly laborious and costly. Languages models for code generation and transpilation are becoming increasingly popular. However, despite capturing code-structure well, code generated by language models is often spurious and contains subtle problems. We propose BatFix , a novel approach that augments language models for transpilation by leveraging program repair and synthesis to fix the code generated by these models. BatFix takes as input both the original program, the target program generated by the machine translation model, and a set of test cases and outputs a repaired program that passes all test cases. Experimental results show that our approach is agnostic to language models and programming languages. BatFix can locate bugs spawning multiple lines and synthesize patches for syntax and semantic bugs for programs migrated from Java to C++ and Python to C++ from multiple language models, including, OpenAI’s Codex .

Funder

National Science Foundation

Portuguese national funds through FCT

Fundação para a Ciência e a Tecnologia

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3658668

Reference77 articles.

1. On the Accuracy of Spectrum-based Fault Localization

2. Spectrum-Based Multiple Fault Localization

3. Massively Multilingual Neural Machine Translation

4. Program synthesis with large language models;Austin Jacob;CoRR,2021

5. Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations (ICLR’15), Yoshua Bengio and Yann LeCun (Eds.). Retrieved from DOI:http://arxiv.org/abs/1409.0473