Evaluating Surprise Adequacy for Deep Learning System Testing

Author:

Kim Jinhan1ORCID,Feldt Robert2ORCID,Yoo Shin1ORCID

Affiliation:

1. KAIST, Yuseong-gu, Daejeon, Republic of Korea

2. Chalmers University of Technology, Gothenburg, Sweden

Abstract

The rapid adoption of Deep Learning (DL) systems in safety critical domains such as medical imaging and autonomous driving urgently calls for ways to test their correctness and robustness. Borrowing from the concept of test adequacy in traditional software testing, existing work on testing of DL systems initially investigated DL systems from structural point of view, leading to a number of coverage metrics. Our lack of understanding of the internal mechanism of Deep Neural Networks (DNNs), however, means that coverage metrics defined on the Boolean dichotomy of coverage are hard to intuitively interpret and understand. We propose the degree of out-of-distribution-ness of a given input as its adequacy for testing: the more surprising a given input is to the DNN under test, the more likely the system will show unexpected behavior for the input. We develop the concept of surprise into a test adequacy criterion, called Surprise Adequacy (SA). Intuitively, SA measures the difference in the behavior of the DNN for the given input and its behavior for the training data. We posit that a good test input should be sufficiently, but not overtly, surprising compared to the training dataset. This article evaluates SA using a range of DL systems from simple image classifiers to autonomous driving car platforms, as well as both small and large data benchmarks ranging from MNIST to ImageNet. The results show that the SA value of an input can be a reliable predictor of the correctness of the mode behavior. We also show that SA can be used to detect adversarial examples, and also be efficiently computed against large training dataset such as ImageNet using sampling.

Funder

National Research Foundation of Korea

Institute of Information & communications Technology Planning & Evaluation

Swedish Scientific Council

Publisher

Association for Computing Machinery (ACM)

Subject

Software

Reference71 articles.

1. Udacity. 2016. Autonomous driving model: Chauffeur. Retrieved from https://github.com/udacity/self-driving-car/tree/master/steering-models/community-models/chauffeur.

2. Udacity. 2016. The Udacity open source self-driving car project. Retrieved from https://github.com/udacity/self-driving-car.

3. Yoshua Bengio Grégoire Mesnil Yann Dauphin and Salah Rifai. 2013. Better mixing via deep representations. Proceedings of the 30th International Conference on Machine Learning 28 1 (2013) 552–560.

4. Mariusz Bojarski Davide Del Testa Daniel Dworakowski Bernhard Firner Beat Flepp Prasoon Goyal Lawrence D. Jackel Mathew Monfort Urs Muller Jiakai Zhang Xin Zhang and Jake Zhao. 2016. End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316.

Cited by 8 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. DeepFeature: Guiding adversarial testing for deep neural network systems using robust features;Journal of Systems and Software;2025-01

2. Test Selection for Deep Neural Networks using Meta-Models with Uncertainty Metrics;Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis;2024-09-11

3. Neuron Semantic-Guided Test Generation for Deep Neural Networks Fuzzing;ACM Transactions on Software Engineering and Methodology;2024-08-14

4. Neuron importance-aware coverage analysis for deep neural network testing;Empirical Software Engineering;2024-07-25

5. Neuron Sensitivity Guided Test Case Selection;ACM Transactions on Software Engineering and Methodology;2024-06-12

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3