A Deep Dive into Large Language Models for Automated Bug Localization and Repair

Author:

Hossain Soneya Binta1ORCID,Jiang Nan2ORCID,Zhou Qiang3ORCID,Li Xiaopeng3ORCID,Chiang Wen-Hao3ORCID,Lyu Yingjun3ORCID,Nguyen Hoan3ORCID,Tripp Omer3ORCID

Affiliation:

1. University of Virginia, Charlottesville, USA

2. Purdue University, West Lafayette, USA

3. Amazon Web Services, Santa Clara, USA

Abstract

Large language models (LLMs) have shown impressive effectiveness in various software engineering tasks, including automated program repair (APR). In this study, we take a deep dive into automated bug localization and repair utilizing LLMs. In contrast to many deep learning-based APR methods that assume known bug locations, rely on line-level localization tools, or address bug prediction and fixing in one step, our approach uniquely employs LLMs to predict bug location at the token level and subsequently utilizes them for bug fixing. This methodological separation of bug localization and fixing using different LLMs enables effective integration of diverse contextual information and improved incorporation of inductive biases. We introduce Toggle: Token-Granulated Bug Localization and Repair, a comprehensive program repair framework that integrates a bug localization model, an adjustment model to address tokenizer inconsistencies, and a bug-fixing model. Toggle takes a buggy function as input and generates a complete corrected function. We investigate various styles of prompting to the bug fixing model to identify the most effective prompts that better utilize the inductive bias and significantly outperform others. Toggle achieves the new state-of-the-art (SOTA) performance on the CodeXGLUE code refinement benchmark, and exhibits better and comparable performance on several other widely-used APR datasets, including Defects4J. In the Defects4J benchmark, our approach consistently ranks above other methods, achieving superior results in the Top-10, Top-30, Top-50, and Top-100 metrics. Besides examining Toggle’s generalizability to unseen data, evaluating the effectiveness of various prompts, we also investigate the impact of additional contextual information such as buggy lines and code comments on bug localization, and explore the importance of the adjustment model. Our extensive experiments offer valuable insights and answers to critical research questions.

Publisher

Association for Computing Machinery (ACM)

Reference47 articles.

1. On the Accuracy of Spectrum-based Fault Localization

2. CODIT: Code Editing with Tree-Based Neural Models

3. Angelica Chen, Jérémy Scheurer, Tomasz Korbak, Jon Ander Campos, Jun Shern Chan, Samuel R. Bowman, Kyunghyun Cho, and Ethan Perez. 2023. Improving Code Generation by Training with Natural Language Feedback. arxiv:2303.16749.

4. Bei Chen Fengji Zhang Anh Nguyen Daoguang Zan Zeqi Lin Jian-Guang Lou and Weizhu Chen. 2022. CodeT: Code Generation with Generated Tests. https://doi.org/10.48550/ARXIV.2207.10397 10.48550/ARXIV.2207.10397

5. SequenceR: Sequence-to-Sequence Learning for End-to-End Program Repair;Chen Zimin;IEEE Transactions on Software Engineering.,2019

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3