Practical Accuracy Estimation for Efficient Deep Neural Network Testing

Author:

Chen Junjie1,Wu Zhuo2,Wang Zan1,You Hanmo1,Zhang Lingming3,Yan Ming1

Affiliation:

1. College of Intelligence and Computing, Tianjin University, Tianjin, China

2. Tianjin International Engineering Institute, Tianjin University, Tianjin, China

3. Department of Computer Science, The University of Texas at Dallas, Richardson, TX, USA

Abstract

Deep neural network (DNN) has become increasingly popular and DNN testing is very critical to guarantee the correctness of DNN, i.e., the accuracy of DNN in this work. However, DNN testing suffers from a serious efficiency problem, i.e., it is costly to label each test input to know the DNN accuracy for the testing set, since labeling each test input involves multiple persons (even with domain-specific knowledge) in a manual way and the testing set is large-scale. To relieve this problem, we propose a novel and practical approach, called PACE (which is short for P ractical AC curacy E stimation), which selects a small set of test inputs that can precisely estimate the accuracy of the whole testing set. In this way, the labeling costs can be largely reduced by just labeling this small set of selected test inputs. Besides achieving a precise accuracy estimation, to make PACE more practical it is also required that it is interpretable, deterministic, and as efficient as possible. Therefore, PACE first incorporates clustering to interpretably divide test inputs with different testing capabilities (i.e., testing different functionalities of a DNN model) into different groups. Then, PACE utilizes the MMD-critic algorithm, a state-of-the-art example-based explanation algorithm, to select prototypes (i.e., the most representative test inputs) from each group, according to the group sizes, which can reduce the impact of noise due to clustering. Meanwhile, PACE also borrows the idea of adaptive random testing to select test inputs from the minority space (i.e., the test inputs that are not clustered into any group) to achieve great diversity under the required number of test inputs. The two parallel selection processes (i.e., selection from both groups and the minority space) compose the final small set of selected test inputs. We conducted an extensive study to evaluate the performance of PACE based on a comprehensive benchmark (i.e., 24 pairs of DNN models and testing sets) by considering different types of models (i.e., classification and regression models, high-accuracy and low-accuracy models, and CNN and RNN models) and different types of test inputs (i.e., original, mutated, and automatically generated test inputs). The results demonstrate that PACE is able to precisely estimate the accuracy of the whole testing set with only 1.181%∼2.302% deviations, on average, significantly outperforming the state-of-the-art approaches.

Funder

Intelligent Manufacturing Special Fund of Tianjin

Innovation Research Project of Tianjin University

National Natural Science Foundation of China

Publisher

Association for Computing Machinery (ACM)

Subject

Software

Reference116 articles.

1. FastICA. 2019. Retrieved from https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition. FastICA. 2019. Retrieved from https://scikit-learn.org/stable/modules/classes.html#module-sklearn.decomposition.

2. HDBScan. 2019. Retrieved from https://pypi.org/project/hdbscan/. HDBScan. 2019. Retrieved from https://pypi.org/project/hdbscan/.

3. Keras. 2019. Retrieved from https://keras.io/. Keras. 2019. Retrieved from https://keras.io/.

4. Github. 2019. MMD-critic algorithms. Retrieved from https://github.com/BeenKim/MMD-critic. Github. 2019. MMD-critic algorithms. Retrieved from https://github.com/BeenKim/MMD-critic.

5. scikit. 2019. scikit-learn. Retrieved from https://scikit-learn.org/stable/. scikit. 2019. scikit-learn. Retrieved from https://scikit-learn.org/stable/.

Cited by 58 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. DeepFeature: Guiding adversarial testing for deep neural network systems using robust features;Journal of Systems and Software;2025-01

2. Distance-Aware Test Input Selection for Deep Neural Networks;Proceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis;2024-09-11

3. Neuron importance-aware coverage analysis for deep neural network testing;Empirical Software Engineering;2024-07-25

4. Towards Exploring the Limitations of Test Selection Techniques on Graph Neural Networks: An Empirical Study;Empirical Software Engineering;2024-07-22

5. Prioritizing test cases for deep learning-based video classifiers;Empirical Software Engineering;2024-07-22

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3