When Automated Program Repair Meets Regression Testing – An Extensive Study on 2 Million Patches-Reference-Cited by-同舟云学术

When Automated Program Repair Meets Regression Testing – An Extensive Study on 2 Million Patches

Published:2024-06-13 Issue: Volume: Page:
ISSN:1049-331X
Container-title:ACM Transactions on Software Engineering and Methodology
language:en
Short-container-title:ACM Trans. Softw. Eng. Methodol.

Author:

Lou Yiling¹^ORCID,Yang Jun²^ORCID,Benton Samuel³^ORCID,Hao Dan⁴^ORCID,Tan Lin⁵^ORCID,Chen Zhenpeng⁶^ORCID,Zhang Lu⁴^ORCID,Zhang Lingming²^ORCID

Affiliation:

1. Fudan University, China

2. University of Illinois Urbana-Champaign, USA

3. The University of Texas at Dallas, USA

4. Peking University, China

5. Purdue University, USA

6. Nanyang Technological University, Singapore

Abstract

In recent years, Automated Program Repair (APR) has been extensively studied in academia and even drawn wide attention from industry. However, APR techniques can be extremely time consuming since (1) a large number of patches can be generated for a given bug, and (2) each patch needs to be executed on the original tests to ensure its correctness. In the literature, various techniques (e.g., based on learning, mining, and constraint solving) have been proposed/studied to reduce the number of patches. Intuitively, every patch can be treated as a software revision during regression testing; thus, traditional Regression Test Selection (RTS) techniques can be leveraged to only execute the tests affected by each patch (as the other tests would keep the same outcomes) to further reduce patch execution time. However, few APR systems actually adopt RTS and there is still a lack of systematic studies demonstrating the benefits of RTS and the impact of different RTS strategies on APR. To this end, this paper presents the first extensive study of widely-used RTS techniques at different levels (i.e., class/method/statement levels) for 12 state-of-the-art APR systems on over 2M patches. Our study reveals various practical guidelines for bridging the gap between APR and regression testing, including: (1) the number of patches widely used for measuring APR efficiency can incur skewed conclusions, and the use of inconsistent RTS configurations can further skew the conclusions; (2) all studied RTS techniques can substantially improve APR efficiency and should be considered in future APR work; (3) method- and statement-level RTS outperform class-level RTS substantially, and should be preferred; (4) RTS techniques can substantially outperform state-of-the-art test prioritization techniques for APR, and combining them can further improve APR efficiency; and (5) traditional Regression Test Prioritization (RTP) widely studied in regression testing performs even better than APR-specific test prioritization when combined with most RTS techniques. Furthermore, we also present the detailed impact of different patch categories and patch validation strategies on our findings.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3672450

Reference89 articles.

1. [n.d.]. Replication package. https://github.com/anomynousdata/RTAPR.

2. 2020. Apache Camel. http://camel.apache.org/.

3. 2020. Apache Commons Math. https://commons.apache.org/proper/commons-math/.

4. 2020. Apache CXF. https://cxf.apache.org/.

5. 2020. Asm java bytecode manipulation and analysis framework. http://asm.ow2.org.