Reversible jump attack to textual classifiers with modification reduction
-
Published:2024-04-22
Issue:9
Volume:113
Page:5907-5937
-
ISSN:0885-6125
-
Container-title:Machine Learning
-
language:en
-
Short-container-title:Mach Learn
Author:
Ni Mingze, Sun Zhensu, Liu WeiORCID
Abstract
AbstractRecent studies on adversarial examples expose vulnerabilities of natural language processing models. Existing techniques for generating adversarial examples are typically driven by deterministic hierarchical rules that are agnostic to the optimal adversarial examples, a strategy that often results in adversarial samples with a suboptimal balance between magnitudes of changes and attack successes. To this end, in this research we propose two algorithms, Reversible Jump Attack (RJA) and Metropolis–Hasting Modification Reduction (MMR), to generate highly effective adversarial examples and to improve the imperceptibility of the examples, respectively. RJA utilizes a novel randomization mechanism to enlarge the search space and efficiently adapts to a number of perturbed words for adversarial examples. With these generated adversarial examples, MMR applies the Metropolis–Hasting sampler to enhance the imperceptibility of adversarial examples. Extensive experiments demonstrate that RJA-MMR outperforms current state-of-the-art methods in attack performance, imperceptibility, fluency and grammar correctness.
Funder
University of Technology Sydney
Publisher
Springer Science and Business Media LLC
Reference66 articles.
1. Alzantot, M., Sharma, Y., Elgohary, A., Ho, B. J., Srivastava, M., & Chang, K. W. (2018). Generating natural language adversarial examples. In Proceedings of the 2018 conference on empirical methods in natural language processing (pp. 2890–2896). 2. Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: Analyzing text with the natural language toolkit. O’Reilly Media, Inc. 3. Cer, D., Yang, Y., Kong, S.-y., Hua, N., Limtiaco, N., St. John, R., Constant, N., Guajardo-Cespedes, M., Yuan, S., Tar, C., Strope, B. Kurzweil, R. (2018). Universal sentence encoder for English. In Proceedings of the 2018 conference on empirical methods in natural language processing: system demonstrations (pp. 169–174). 4. Cheng, M., Yi, J., Chen, P. Y., Zhang, H., & Hsieh, C. J. (2020). Seq2sick: Evaluating the robustness of sequence-to-sequence models with adversarial examples. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 04, pp. 3601-3608). 5. Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL.
|
|