Prioritizing test cases for deep learning-based video classifiers-Reference-Cited by-同舟云学术

Prioritizing test cases for deep learning-based video classifiers

Published:2024-07-22 Issue:5 Volume:29 Page:
ISSN:1382-3256
Container-title:Empirical Software Engineering
language:en
Short-container-title:Empir Software Eng

Author:

Li Yinghua,Dang Xueqi^ORCID,Ma Lei,Klein Jacques,Bissyandé Tegawendé F.

Abstract

AbstractThe widespread adoption of video-based applications across various fields highlights their importance in modern software systems. However, in comparison to images or text, labelling video test cases for the purpose of assessing system accuracy can lead to increased expenses due to their temporal structure and larger volume. Test prioritization has emerged as a promising approach to mitigate the labeling cost, which prioritizes potentially misclassified test inputs so that such inputs can be identified earlier with limited time and manual labeling efforts. However, applying existing prioritization techniques to video test cases faces certain limitations: they do not account for the unique temporal information present in video data. Unlike static image datasets that only contain spatial information, video inputs consist of multiple frames that capture the dynamic changes of objects over time. In this paper, we propose VRank, the first test prioritization approach designed specifically for video test inputs. The fundamental idea behind VRank is that video-type tests with a higher probability of being misclassified by the evaluated DNN classifier are considered more likely to reveal faults and will be prioritized higher. To this end, we train a ranking model with the aim of predicting the probability of a given test input being misclassified by a DNN classifier. This prediction relies on four types of generated features: temporal features (TF), video embedding features (EF), prediction features (PF), and uncertainty features (UF). We rank all test inputs in the target test set based on their misclassification probabilities. Videos with a higher likelihood of being misclassified will be prioritized higher. We conducted an empirical evaluation to assess the performance of VRank, involving 120 subjects with both natural and noisy datasets. The experimental results reveal VRank outperforms all compared test prioritization methods, with an average improvement of 5.76%

$$\sim $$

∼ 46.51% on natural datasets and 4.26%

$$\sim $$

∼ 53.56% on noisy datasets.

Funder

H2020 European Research Council

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10664-024-10520-1.pdf

Reference83 articles.

1. Aggarwal A, Lohia P, Nagar S, Dey K, Saha D (2019) Black box fairness testing of machine learning models. In: Proceedings of the 2019 27th ACM joint meeting on european software engineering conference and symposium on the foundations of software engineering, pp 625–635

2. Agrawal AK, Agarwal K, Choudhary J, Bhattacharya A, Tangudu S, Makhija N, Rajitha B (2020) Automatic traffic accident detection system using resnet and svm. In: 2020 Fifth International conference on research in computational intelligence and communication networks (ICRCICN), IEEE, pp 71–76

3. Ahmed S, Nielsen IE, Tripathi A, Siddiqui S, Ramachandran RP, Rasool G (2023) Transformers in time-series analysis: a tutorial. Circuits, Syst, Signal Process 42(12):7433–7466

4. Bouhsissin S, Sael N, Benabbou F (2021) Enhanced vgg19 model for accident detection and classification from video. In: 2021 International conference on digital age & technological advances for sustainable development (ICDATA), IEEE, pp 39–46

5. Breiman L (2001) Random forests. Machine Learn 45(1):5–32