Abstract
Abstract
Objective
We investigated whether (possibly wrong) security patches suggested by Automated Program Repairs (APR) for real world projects are recognized by human reviewers. We also investigated whether knowing that a patch was produced by an allegedly specialized tool does change the decision of human reviewers.
Method
We perform an experiment with $$n= 72$$
n
=
72
Master students in Computer Science. In the first phase, using a balanced design, we propose to human reviewers a combination of patches proposed by APR tools for different vulnerabilities and ask reviewers to adopt or reject the proposed patches. In the second phase, we tell participants that some of the proposed patches were generated by security-specialized tools (even if the tool was actually a ‘normal’ APR tool) and measure whether the human reviewers would change their decision to adopt or reject a patch.
Results
It is easier to identify wrong patches than correct patches, and correct patches are not confused with partially correct patches. Also patches from APR Security tools are adopted more often than patches suggested by generic APR tools but there is not enough evidence to verify if ‘bogus’ security claims are distinguishable from ‘true security’ claims. Finally, the number of switches to the patches suggested by security tool is significantly higher after the security information is revealed irrespective of correctness.
Limitations
The experiment was conducted in an academic setting, and focused on a limited sample of popular APR tools and popular vulnerability types.
Funder
H2020 LEIT Information and Communication Technologies
NWO
HORIZON EUROPE Global Challenges and European Industrial Competitiveness
Nederlandse Organisatie voor Wetenschappelijk Onderzoek
Publisher
Springer Science and Business Media LLC