Keeper: Automated Testing and Fixing of Machine Learning Software

Author:

Wan Chengcheng1ORCID,Liu Shicheng2ORCID,Xie Sophie3ORCID,Liu Yuhan4ORCID,Hoffmann Henry4ORCID,Maire Michael4ORCID,Lu Shan5ORCID

Affiliation:

1. East China Normal University, China

2. Stanford University, United States

3. University of California, Berkeley, United States

4. University of Chicago, United States

5. Microsoft Research and University of Chicago, United States

Abstract

The increasing number of software applications incorporating machine learning (ML) solutions has led to the need for testing techniques. However, testing ML software requires tremendous human effort to design realistic and relevant test inputs, and to judge software output correctness according to human common sense. Even when misbehavior is exposed, it is often unclear whether the defect is inside ML API or the surrounding code, and how to fix the implementation. This article tackles these challenges by proposing Keeper, an automated testing and fixing tool for ML software. The core idea of Keeper is designing pseudo-inverse functions that semantically reverse the corresponding ML task in an empirical way and proxy common human judgment of real-world data. It incorporates these functions into a symbolic execution engine to generate tests. Keeper also detects code smells that degrade software performance. Once misbehavior is exposed, Keeper attempts to change how ML APIs are used to alleviate the misbehavior. Our evaluation on a variety of applications shows that Keeper greatly improves branch coverage, while identifying 74 previously unknown failures and 19 code smells from 56 out of 104 applications. Our user studies show that 78% of end-users and 95% of developers agree with Keeper's detection and fixing results.

Publisher

Association for Computing Machinery (ACM)

Reference122 articles.

1. Raja Ben Abdessalem Shiva Nejati Lionel C Briand and Thomas Stifter. 2018. Testing vision-based control systems using learnable evolutionary algorithms. In ICSE.

2. Amazon. 2022. Amazon artificial intelligence service. Online document https://aws.amazon.com/machine-learning/ai-services.

3. Saleema Amershi, Andrew Begel, Christian Bird, Robert DeLine, Harald Gall, Ece Kamar, Nachiappan Nagappan, Besmira Nushi, and Thomas Zimmermann. 2019. Software engineering for machine learning: A case study. In ICSE-SEIP. IEEE, 291–300.

4. Saleema Amershi, Max Chickering, Steven M Drucker, Bongshin Lee, Patrice Simard, and Jina Suh. 2015. Modeltracker: Redesigning performance analysis tools for machine learning. In CHI.

5. Rico Angell, Brittany Johnson, Yuriy Brun, and Alexandra Meliou. 2018. Themis: Automatically testing software for discrimination. In ESEC/FSE.

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3