Input Distribution Coverage: Measuring Feature Interaction Adequacy in Neural Network Testing


Dola Swaroopa1ORCID,Dwyer Matthew B.1ORCID,Soffa Mary Lou1ORCID


1. University of Virginia, Charlottesville, Virginia, USA


Testing deep neural networks (DNNs) has garnered great interest in the recent years due to their use in many applications. Black-box test adequacy measures are useful for guiding the testing process in covering the input domain. However, the absence of input specifications makes it challenging to apply black-box test adequacy measures in DNN testing. The Input Distribution Coverage (IDC) framework addresses this challenge by using a variational autoencoder to learn a low dimensional latent representation of the input distribution, and then using that latent space as a coverage domain for testing. IDC applies combinatorial interaction testing on a partitioning of the latent space to measure test adequacy. Empirical evaluation demonstrates that IDC is cost-effective, capable of detecting feature diversity in test inputs, and more sensitive than prior work to test inputs generated using different DNN test generation methods. The findings demonstrate that IDC overcomes several limitations of white-box DNN coverage approaches by discounting coverage from unrealistic inputs and enabling the calculation of test adequacy metrics that capture the feature diversity present in the input space of DNNs.


National Science Foundation awards

The Air Force Office of Scientific Research

Lockheed Martin Advanced Technology Laboratories


Association for Computing Machinery (ACM)



Reference95 articles.

1. Martín Abadi Ashish Agarwal Paul Barham Eugene Brevdo Zhifeng Chen Craig Citro Greg S. Corrado Andy Davis Jeffrey Dean Matthieu Devin Sanjay Ghemawat Ian Goodfellow Andrew Harp Geoffrey Irving Michael Isard Yangqing Jia Rafal Jozefowicz Lukasz Kaiser Manjunath Kudlur Josh Levenberg Dandelion Mané Rajat Monga Sherry Moore Derek Murray Chris Olah Mike Schuster Jonathon Shlens Benoit Steiner Ilya Sutskever Kunal Talwar Paul Tucker Vincent Vanhoucke Vijay Vasudevan Fernanda Viégas Oriol Vinyals Pete Warden Martin Wattenberg Martin Wicke Yuan Yu and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from

2. How faithful is your synthetic data? Sample-level metrics for evaluating and auditing generative models;Alaa Ahmed M.;CoRR,2021

3. Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria

4. David Bau, Jun-Yan Zhu, Jonas Wulff, William Peebles, Hendrik Strobelt, Bolei Zhou, and Antonio Torralba. 2019. Seeing what a GAN cannot generate. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4502–4511.

5. David Berend, Xiaofei Xie, Lei Ma, Lingjun Zhou, Yang Liu, Chi Xu, and Jianjun Zhao. 2020. Cats are not fish: Deep learning testing calls for out-of-distribution awareness. In Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering. 1041–1052.

Cited by 4 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. Test Input Prioritization for 3D Point Clouds;ACM Transactions on Software Engineering and Methodology;2024-01-27

2. DistXplore: Distribution-Guided Testing for Evaluating and Enhancing Deep Learning Systems;Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering;2023-11-30

3. Seed Selection for Testing Deep Neural Networks;ACM Transactions on Software Engineering and Methodology;2023-11-24

4. Introducing test theory to deep neural network testing: reliability, validity, difficulty and discrimination;Third International Conference on Artificial Intelligence, Virtual Reality, and Visualization (AIVRV 2023);2023-11-08







Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3