Abstract
The field of automatic detection of quality attributes from software requirements’ text stands as one of the most pioneering realms within software requirements research. Such automatic quality attributes aim to aid stakeholders in establishing the system architecture and preemptively circumventing faults. A considerable number of classifier models have been put forward, many of which show encouraging results. However, our analysis has identified substantial gaps in these studies, including (a) a limited dataset volume, (b) the absence of an evaluation study for cross‐domain test sets, (c) the problem of real‐time prediction scenarios where a vast amount of unlabeled data floods the system each second, and (d) a dearth of comparative studies scrutinizing diverse software requirements datasets and multiple machine learning models, with particular emphasis on in‐domain and cross‐domain testing. Hence, there is a pressing need to construct an alternative framework to enhance classifier performance under such conditions. Our research is primarily centered on developing a semisupervised methodology that hinges on GAN‐BERT, introducing two datasets for the requirements of the engineering community and delivering comparative studies that consider a variety of classifiers and two labeling paradigms, namely, binary and multi. Remarkably, even with fewer data in a multiclassification scenario, our model outperforms other classifiers when assessing data from both identical and different domains.
Funder
Universitas Katolik Indonesia Atma Jaya