Keeper: Automated Testing and Fixing of Machine Learning Software-Reference-Cited by-同舟云学术

Keeper: Automated Testing and Fixing of Machine Learning Software

Published:2024-06-13 Issue: Volume: Page:
ISSN:1049-331X
Container-title:ACM Transactions on Software Engineering and Methodology
language:en
Short-container-title:ACM Trans. Softw. Eng. Methodol.

Author:

Wan Chengcheng¹^ORCID,Liu Shicheng²^ORCID,Xie Sophie³^ORCID,Liu Yuhan⁴^ORCID,Hoffmann Henry⁴^ORCID,Maire Michael⁴^ORCID,Lu Shan⁵^ORCID

Affiliation:

1. East China Normal University, China

2. Stanford University, United States

3. University of California, Berkeley, United States

4. University of Chicago, United States

5. Microsoft Research and University of Chicago, United States

Abstract

The increasing number of software applications incorporating machine learning (ML) solutions has led to the need for testing techniques. However, testing ML software requires tremendous human effort to design realistic and relevant test inputs, and to judge software output correctness according to human common sense. Even when misbehavior is exposed, it is often unclear whether the defect is inside ML API or the surrounding code, and how to fix the implementation. This article tackles these challenges by proposing Keeper, an automated testing and fixing tool for ML software. The core idea of Keeper is designing pseudo-inverse functions that semantically reverse the corresponding ML task in an empirical way and proxy common human judgment of real-world data. It incorporates these functions into a symbolic execution engine to generate tests. Keeper also detects code smells that degrade software performance. Once misbehavior is exposed, Keeper attempts to change how ML APIs are used to alleviate the misbehavior. Our evaluation on a variety of applications shows that Keeper greatly improves branch coverage, while identifying 74 previously unknown failures and 19 code smells from 56 out of 104 applications. Our user studies show that 78% of end-users and 95% of developers agree with Keeper's detection and fixing results.

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3672451

Reference122 articles.

1. Raja Ben Abdessalem Shiva Nejati Lionel C Briand and Thomas Stifter. 2018. Testing vision-based control systems using learnable evolutionary algorithms. In ICSE.

2. Amazon. 2022. Amazon artificial intelligence service. Online document https://aws.amazon.com/machine-learning/ai-services.

3. Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In ICSE-SEIP. IEEE, 291–300.

4. Saleema Amershi, Max Chickering, Steven M Drucker, Bongshin Lee, Patrice Simard, and Jina Suh. 2015. Modeltracker: Redesigning performance analysis tools for machine learning. In CHI.

5. Rico Angell, Brittany Johnson, Yuriy Brun, and Alexandra Meliou. 2018. Themis: Automatically testing software for discrimination. In ESEC/FSE.