Semi-supervised Crowdsourced Test Report Clustering via Screenshot-Text Binding Rules-Reference-Cited by-同舟云学术

Semi-supervised Crowdsourced Test Report Clustering via Screenshot-Text Binding Rules

Published:2024-07-12 Issue:FSE Volume:1 Page:1540-1563
ISSN:2994-970X
Container-title:Proceedings of the ACM on Software Engineering
language:en
Short-container-title:Proc. ACM Softw. Eng.

Author:

Yu Shengcheng¹^ORCID,Fang Chunrong¹^ORCID,Zhang Quanjun¹^ORCID,Du Mingzhe¹^ORCID,Liu Jia¹^ORCID,Chen Zhenyu¹^ORCID

Affiliation:

1. Nanjing University, Nanjing, China

Abstract

Due to the openness of the crowdsourced testing paradigm, crowdworkers submit massive spotty duplicate test reports, which hinders developers from effectively reviewing the reports and detecting bugs. Test report clustering is widely used to alleviate this problem and improve the effectiveness of crowdsourced testing. Existing clustering methods basically rely on the analysis of textual descriptions. A few methods are independently supplemented by analyzing screenshots in test reports as pixel sets, leaving out the semantics of app screenshots from the widget perspective. Further, ignoring the semantic relationships between screenshots and textual descriptions may lead to the imprecise analysis of test reports, which in turn negatively affects the clustering effectiveness. This paper proposes a semi-supervised crowdsourced test report clustering approach, namely SemCluster. SemCluster respectively extracts features from app screenshots and textual descriptions and forms the structure feature, the content feature, the bug feature, and reproduction steps. The clustering is principally conducted on the basis of the four features. Further, in order to avoid bias of specific individual features, SemCluster exploits the semantic relationships between app screenshots and textual descriptions to form the semantic binding rules as guidance for clustering crowdsourced test reports. Experiment results show that SemCluster outperforms state-of-the-art approaches on six widely used metrics by 10.49% -- 200.67%, illustrating the excellent effectiveness.

Funder

National Natural Science Foundation of China

The Science, Technology, and Innovation Commission of Shenzhen Municipality

Publisher

Association for Computing Machinery (ACM)

Link

https://dl.acm.org/doi/pdf/10.1145/3660776

Reference66 articles.

1. 1990. Partitioning Around Medoids (Program PAM). 68–125.

2. A contextual approach towards more accurate duplicate bug report detection

3. Characterizing and Comparing External Measures for the Assessment of Cluster Analysis and Community Detection

4. Automated Duplicate Bug Report Classification Using Subsequence Matching

5. A fusion approach for classifying duplicate problem reports