Affiliation:
1. The University of Texas at Dallas, USA
2. University of California, Riverside, USA
Abstract
Natural language processing (NLP) has gained widespread adoption in the development of real-world applications. However, the black-box nature of neural networks in NLP applications poses a challenge when evaluating their performance, let alone ensuring it. Recent research has proposed testing techniques to enhance the trustworthiness of NLP-based applications. However, most existing works use a single, aggregated metric (
i.e
., accuracy) which is difficult for users to assess NLP model performance on fine-grained aspects such as linguistic capabilities. To address this limitation, we present ALiCT, an automated testing technique for validating NLP applications based on their linguistic capabilities. ALiCT takes user-specified linguistic capabilities as inputs and produce diverse test suite with test oracles for each of given linguistic capability. We evaluate ALiCT on two widely adopted NLP tasks, sentiment analysis and hate speech detection, in terms of diversity, effectiveness, and consistency. Using Self-BLEU and syntactic diversity metrics, our findings reveal that ALiCT generates test cases that are 190% and 2213% more diverse in semantics and syntax, respectively, compared to those produced by state-of-the-art techniques. In addition, ALiCT is capable of producing a larger number of NLP model failures in 22 out of 25 linguistic capabilities over the two NLP applications.
Publisher
Association for Computing Machinery (ACM)
Reference83 articles.
1. BiasFinder: Metamorphic Test Generation to Uncover Bias for Sentiment Analysis Systems
2. Stefano Baccianella, Andrea Esuli, and Fabrizio Sebastiani. 2010. SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA), Valletta, Malta. http://www.lrec-conf.org/proceedings/lrec2010/pdf/769_Paper.pdf
3. Cats are not fish
4. Potential use of chat gpt in global warming;Biswas Som S;Annals of biomedical engineering,2023