Affiliation:
1. National University of Defense Technology, Changsha, China
2. Huazhong University of Science and Technology, Wuhan, China
Abstract
Despite the capability in successfully fixing more and more real-world bugs, existing
Automated Program Repair (APR)
techniques are still challenged by the long-standing overfitting problem (i.e., a generated patch that passes all tests is actually incorrect). Plenty of approaches have been proposed for
automated patch correctness assessment (APCA
). Nonetheless, dynamic ones (i.e., those that needed to execute tests) are time-consuming while static ones (i.e., those built on top of static code features) are less precise. Therefore, embedding techniques have been proposed recently, which assess patch correctness via embedding token sequences extracted from the changed code of a generated patch. However, existing techniques rarely considered the context information and program structures of a generated patch, which are crucial for patch correctness assessment as revealed by existing studies. In this study, we explore the idea of context-aware code change embedding considering program structures for patch correctness assessment. Specifically, given a patch, we not only focus on the changed code but also take the correlated unchanged part into consideration, through which the context information can be extracted and leveraged. We then utilize the
AST path
technique for representation where the structure information from AST node can be captured. Finally, based on several pre-defined heuristics, we build a deep learning based classifier to predict the correctness of the patch. We implemented this idea as
Cache
and performed extensive experiments to assess its effectiveness. Our results demonstrate that
Cache
can (1) perform better than previous representation learning based techniques (e.g.,
Cache
relatively outperforms existing techniques by
\( \approx \)
6%,
\( \approx \)
3%, and
\( \approx \)
16%, respectively under three diverse experiment settings), and (2) achieve overall higher performance than existing APCA techniques while even being more precise than certain dynamic ones including PATCH-SIM (92.9% vs. 83.0%). Further results reveal that the context information and program structures leveraged by
Cache
contributed significantly to its outstanding performance.
Funder
National Natural Science Foundation of China
Publisher
Association for Computing Machinery (ACM)
Reference102 articles.
1. On the Accuracy of Spectrum-based Fault Localization
2. Uri Alon, Shaked Brody, Omer Levy, and Eran Yahav. 2019. code2seq: Generating sequences from structured representations of code. In Proceedings of the 7th International Conference on Learning Representations. OpenReview.net.
3. A general path-based representation for predicting program properties
4. code2vec: learning distributed representations of code
Cited by
40 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献