Context-Aware Code Change Embedding for Better Patch Correctness Assessment

Author:

Lin Bo1ORCID,Wang Shangwen1ORCID,Wen Ming2ORCID,Mao Xiaoguang1ORCID

Affiliation:

1. National University of Defense Technology, Changsha, China

2. Huazhong University of Science and Technology, Wuhan, China

Abstract

Despite the capability in successfully fixing more and more real-world bugs, existing Automated Program Repair (APR) techniques are still challenged by the long-standing overfitting problem (i.e., a generated patch that passes all tests is actually incorrect). Plenty of approaches have been proposed for automated patch correctness assessment (APCA ). Nonetheless, dynamic ones (i.e., those that needed to execute tests) are time-consuming while static ones (i.e., those built on top of static code features) are less precise. Therefore, embedding techniques have been proposed recently, which assess patch correctness via embedding token sequences extracted from the changed code of a generated patch. However, existing techniques rarely considered the context information and program structures of a generated patch, which are crucial for patch correctness assessment as revealed by existing studies. In this study, we explore the idea of context-aware code change embedding considering program structures for patch correctness assessment. Specifically, given a patch, we not only focus on the changed code but also take the correlated unchanged part into consideration, through which the context information can be extracted and leveraged. We then utilize the AST path technique for representation where the structure information from AST node can be captured. Finally, based on several pre-defined heuristics, we build a deep learning based classifier to predict the correctness of the patch. We implemented this idea as Cache and performed extensive experiments to assess its effectiveness. Our results demonstrate that Cache can (1) perform better than previous representation learning based techniques (e.g., Cache relatively outperforms existing techniques by \( \approx \) 6%, \( \approx \) 3%, and \( \approx \) 16%, respectively under three diverse experiment settings), and (2) achieve overall higher performance than existing APCA techniques while even being more precise than certain dynamic ones including PATCH-SIM (92.9% vs. 83.0%). Further results reveal that the context information and program structures leveraged by Cache contributed significantly to its outstanding performance.

Funder

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Subject

Software

Reference102 articles.

1. On the Accuracy of Spectrum-based Fault Localization

2. Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019. code2seq: Generating sequences from structured representations of code. In Proceedings of the 7th International Conference on Learning Representations. OpenReview.net.

3. A general path-based representation for predicting program properties

4. code2vec: learning distributed representations of code

Cited by 26 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A Survey of Learning-based Automated Program Repair;ACM Transactions on Software Engineering and Methodology;2023-12-23

2. Smart Contract Code Repair Recommendation based on Reinforcement Learning and Multi-metric Optimization;ACM Transactions on Software Engineering and Methodology;2023-12-11

3. CCT5: A Code-Change-Oriented Pre-trained Model;Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering;2023-11-30

4. A Large-Scale Empirical Review of Patch Correctness Checking Approaches;Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering;2023-11-30

5. Automatic recognizing relevant fragments of APIs using API references;Automated Software Engineering;2023-11-19

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3